[resolved] tFileList-Read CSV files

Six Stars

[resolved] tFileList-Read CSV files

Hi Community,
I have many csv files in distributed directory. There are duplicate file-names in those directory. I want to read those files only once, if there are duplicate filename it should read only one file.
example
D:\test\a\ abc.csv, 123.csv,yud.csv
D:\test\b\rd.csv,xy.csv
D:\test\abc.csv,fty.csv
In above you can observe abc.csv is located in 2 locations. I want to read one among these two csv.
Please do needful help.
Thanks,
Sravanth   

Accepted Solutions
One Star

Re: [resolved] tFileList-Read CSV files

Try this 
tFileList(select Include Subdirectories option)----->tIterateToFlow-------->tUniqRow
hope this help you

All Replies
Five Stars

Re: [resolved] tFileList-Read CSV files

You need to store the file names. Where (memory/file/database) depends on whether or not you want this de-duplication to persist across runs of your Job.
A database table of processed files may be the sensible option. You can then insert each successfully processed file and then check the database each time you pick up a new one.
If you don't have a database to hand, I always use SQLite for this type of activity.
Six Stars

Re: [resolved] tFileList-Read CSV files

Hi Alan,
Thanks for reply.
Can you please say in terms of talend implementation. Show me the way like what component I have use in squeal with screenshot.
Thanks,
Sravanth 
One Star

Re: [resolved] tFileList-Read CSV files

Try this 
tFileList(select Include Subdirectories option)----->tIterateToFlow-------->tUniqRow
hope this help you
Six Stars

Re: [resolved] tFileList-Read CSV files

Thanks manish. Your suggestion makes lot of sense