Talend data integration for File operation

Talend data integration for File operation

I need to use Talend data integration tool for following scenario.
There will be flat file containing 10 million records, we will take each record apply some business logic and again store in a separate file Data.txt.
Now Data.txt file contains 10 million records.
There is another file newdata.txt containing 10 K records so now we want to check each of this record whether it is in Data.txt or not
Case 1 : if the record exist will update the Data.txt file
Case 2. If record doesn?t exist in Data.txt then will insert the new record in Data.txt file.
I want help in finding how this can be done using Talend data integration
Please give some path way to proceed.
Five Stars

Re: Talend data integration for File operation

The components you'll be wanting to look at are:-
tFileInputDelimited (Assuming your files are delimited; but there are other variants).
tFileOutputDelimited
tMap to join your data and perform your transformations

Re: Talend data integration for File operation

Yes i know i can use these components but the problem is as my look file will be of large size so it will completely loaded in the memory when i'll use tmap component and consume lot of resources so is there any other way out for doing this ??
Five Stars

Re: Talend data integration for File operation

Yes i know i can use these components but the problem is as my look file will be of large size so it will completely loaded in the memory when i'll use tmap component and consume lot of resources so is there any other way out for doing this ??

Ok. But that's not what you asked.
If you've got enough memory, then increase heap and do just that.
tMap map has join options so that you can "Store temp data" to disk. It will be slower but will conserve memory.
Seventeen Stars

Re: Talend data integration for File operation

hi all,
i think using text file it's the most optimistic way to search into data, 'couse there is no indexes.
how about table ?
regards
laurent
Five Stars

Re: Talend data integration for File operation

hi all,
i think using text file it's the most optimistic way to search into data, 'couse there is no indexes.
how about table ?
regards
laurent

I think it depends on the use case.
If you have two text files and you want to join them once, then I think it is perfectly acceptable, although in the case of Talend, it can be memory hungry.
There may be no indexes; but there also isn't all of the other overheads of an RDBMS.
In this particular case, I can see that performing a look-up against 10M rows, requires some thought.
Without knowing the source of these files, how often they change and how often the Job runs, it's difficult to recommend loading them in to a database.