use case 10 "Merging two files, line by line" in the wiki

One Star

use case 10 "Merging two files, line by line" in the wiki

I was astounded by this example because it seems extremely cumbersome to use this tool to merge two files. That's probably an unfair conclusion. However, the author does the product no service with the example. (No offense intended yet it will probably be taken as such).
The gist of the example is to demonstrate merging two files line by line in alternating sequence. The solution is to first assign sequence numbers; that is, instruct it on collation.
Why the hell should one go through that! Why not simply read a line from one source file, output it to a target file; then read from the second file and write the output to the target file? Continue to alternately read from each source as the target file is written. As a variation provide the ability to discard or retain null or blank lines in the merging.
Any tool has its constraints but if this example is the 'best' method to achieve that simple goal, then it leaves something to be desired. I seek open and frank discussion, not confrontation please. If my expressions are too aggressive or offensive, then that is my faulty use of language.

Re: use case 10 "Merging two files, line by line" in the wiki

You're right, use case 10 shows that this simple operation is not so simple to achieve with TOS, currently. You're obviously not the first one to tell us so. We've very recently implemented the lookup on sorted data flow (see 3373) and this should help us to solve the "merge files line by line" problem.
Performing a merge line by line on 2 files would be very easy with 2 files for example. But in TOS, we want to forget the datasource once the data is in the flow. For example, it makes no difference in a lookup if the data come from a database, an RSS feed or a file. Adding this constraint to the equation makes the problem much complicated. But maybe we should solve the problem only for files, as a first step. It should cover 95% of user needs.
Would you like a tFileMerge dedicated component? (if so, please create a feature request in the bugtracker)


Talend named a Leader.

Get your copy


Kickstart your first data integration and ETL projects.

Download now

Best Practices for Using Context Variables with Talend – Part 1

Learn how to do cool things with Context Variables


Migrate Data from one Database to another with one Job using the Dynamic Schema

Find out how to migrate from one database to another using the Dynamic schema


Best Practices for Using Context Variables with Talend – Part 4

Pick up some tips and tricks with Context Variables