Five Stars

how to set tMap or tFileOutputDelimited to dump each row as it process

I have this data with 2 record types, HEADER and LINE

 table.jpg

 

And I want output to be dumped as it processes

HEADER;h1b;hcc;hcd
LINE;h1lb;h1lc;h1ld;h1le;h1lf
HEADER;h2b;h2c;h3d
LINE;h2lb;h2lc;h2ld;h2le;h2lf
LINE;h2l2;h2lc;h2ld;h2le;h2lf

 

but it seems its buffering and grouping each record type and dumping like this

 

HEADER;h1b;hcc;hcd
HEADER;h2b;h2c;h3d
LINE;h1lb;h1lc;h1ld;h1le;h1lf
LINE;h2lb;h2lc;h2ld;h2le;h2lf
LINE;h2l2;h2lc;h2ld;h2le;h2lf

 

Here is my job setup -

BufferedRecords.jpg

 

I would appreciate any help/suggestion, thank you.

 

Muru

  • Data Integration
1 ACCEPTED SOLUTION

Accepted Solutions
Ten Stars

Re: how to set tMap or tFileOutputDelimited to dump each row as it process

Like this:

Talend muru example.png

8 REPLIES
Nine Stars TRF
Nine Stars

Re: how to set tMap or tFileOutputDelimited to dump each row as it process

As you can see, each flow after tFilterRow as an order, that's the reason why you get the headers 1st, then the lines.

Using a tMap add a sequence to memorize the original records order.

At the end, start a new subjob (connect with onSubjobOk to tFileInputDelimited), sort the records using the sequence field, then remove it.


TRF
Five Stars

Re: how to set tMap or tFileOutputDelimited to dump each row as it process

Thank you TRF for the quick response, could you please show the design in picture, I am new to talend so not sure how to setup this.

 

Muru

Ten Stars

Re: how to set tMap or tFileOutputDelimited to dump each row as it process

Like this:

Talend muru example.png

Nine Stars TRF
Nine Stars

Re: how to set tMap or tFileOutputDelimited to dump each row as it process

Here is the job design, I remove tFilter and include the filtering operation into the tMap:

Capture.PNG

 

Here is the tMap:

Capture.PNG

 

The sequence is calculated for each row using a local variable which used to populate the new field called Rank.

Finally, tSortRow to reorder the file using the Rank field:

Capture.PNG

 

tFilterRow to remove the Rank field:

Capture.PNG

 

And the final CSV output file:

Capture.PNG

 

Looks like the input file, so my question is: what is the objective for this job???


TRF
Five Stars

Re: how to set tMap or tFileOutputDelimited to dump each row as it process

Thank you so much TRF, actually I did not show all additional transformation for simplicity sake, main thing is that input is in excel, which has multiple schema, each schema could be having different number of columns, output should end up as '~' delimited, but my design was adding additional '~' for the header as it had less columns than line.

I setup the job as below but nothing is flowing into the subjob input file -

only difference I had was for out24 expression i had row1.A.equalsIgnoreCase("LINE") instead of !row1.A.equalsIgnoreCase("HEADER") as you had, as I am planning to add more subtype records like "SUBLINE". It seems to show the LINE type record in output window though, not sure why its not showing in the file, file is empty. I am missing something simple I guess.

 

sss.jpg

 

 

Just to clarify this is the output I want- (ignore '~' I mentioned, I was '~' or ';' it does not matter).

HEADER;h1b;hcc;hcd
LINE;h1lb;h1lc;h1ld;h1le;h1lf
HEADER;h2b;h2c;h3d
LINE;h2lb;h2lc;h2ld;h2le;h2lf
LINE;h2l2;h2lc;h2ld;h2le;h2lf

Nine Stars TRF
Nine Stars

Re: how to set tMap or tFileOutputDelimited to dump each row as it process

Can you share the configuration for tFileInputDelimited_1?

It sounds like the field separator is not the same as in tFileOutputDelimited_1 & 2.

In this case, all the values are concateneted into the 1st file which expect to be an integer, the reason why you get no rows and the "For input string" message in the console.


TRF
Five Stars

Re: how to set tMap or tFileOutputDelimited to dump each row as it process

Actually you are right. After fixing the delimiter its resolved, however since the input for the second job comes from 2 schemas it takes the schema of the most numbers so it ends with additional delimiters for the header as it has less columns/fields.

 

I think I have a temporary solution so both tmaps will dump into a file with similar schema rank, allColumns as shown below, was wondering if there is way to avoid the extra delmiters with using automap for columns as u gave in the original response as i am gonna use more fields that will be handy.

 

concat.jpg

Five Stars

Re: how to set tMap or tFileOutputDelimited to dump each row as it process

Finally got a solution, I added another delimiter to the second column ( a '|' prefixed) so that the output produced by the first job kind of has only 2 columns, rank and everything else.

And input for the second job only sees only those 2 columns 

 

split.jpg

 

 And the field separator for the input of the second job is set to that combination ie ";|".

Thank you TRF, your solution pointed me in the right direction.