I have this data with 2 record types, HEADER and LINE
And I want output to be dumped as it processes
but it seems its buffering and grouping each record type and dumping like this
Here is my job setup -
I would appreciate any help/suggestion, thank you.
Solved! Go to Solution.
As you can see, each flow after tFilterRow as an order, that's the reason why you get the headers 1st, then the lines.
Using a tMap add a sequence to memorize the original records order.
At the end, start a new subjob (connect with onSubjobOk to tFileInputDelimited), sort the records using the sequence field, then remove it.
Thank you TRF for the quick response, could you please show the design in picture, I am new to talend so not sure how to setup this.
Here is the job design, I remove tFilter and include the filtering operation into the tMap:
Here is the tMap:
The sequence is calculated for each row using a local variable which used to populate the new field called Rank.
Finally, tSortRow to reorder the file using the Rank field:
tFilterRow to remove the Rank field:
And the final CSV output file:
Looks like the input file, so my question is: what is the objective for this job???
Thank you so much TRF, actually I did not show all additional transformation for simplicity sake, main thing is that input is in excel, which has multiple schema, each schema could be having different number of columns, output should end up as '~' delimited, but my design was adding additional '~' for the header as it had less columns than line.
I setup the job as below but nothing is flowing into the subjob input file -
only difference I had was for out24 expression i had row1.A.equalsIgnoreCase("LINE") instead of !row1.A.equalsIgnoreCase("HEADER") as you had, as I am planning to add more subtype records like "SUBLINE". It seems to show the LINE type record in output window though, not sure why its not showing in the file, file is empty. I am missing something simple I guess.
Just to clarify this is the output I want- (ignore '~' I mentioned, I was '~' or ';' it does not matter).
Can you share the configuration for tFileInputDelimited_1?
It sounds like the field separator is not the same as in tFileOutputDelimited_1 & 2.
In this case, all the values are concateneted into the 1st file which expect to be an integer, the reason why you get no rows and the "For input string" message in the console.
Actually you are right. After fixing the delimiter its resolved, however since the input for the second job comes from 2 schemas it takes the schema of the most numbers so it ends with additional delimiters for the header as it has less columns/fields.
I think I have a temporary solution so both tmaps will dump into a file with similar schema rank, allColumns as shown below, was wondering if there is way to avoid the extra delmiters with using automap for columns as u gave in the original response as i am gonna use more fields that will be handy.
Finally got a solution, I added another delimiter to the second column ( a '|' prefixed) so that the output produced by the first job kind of has only 2 columns, rank and everything else.
And input for the second job only sees only those 2 columns
And the field separator for the input of the second job is set to that combination ie ";|".
Thank you TRF, your solution pointed me in the right direction.
The first 100 community members completing the Open Studio survey win a $10 gift voucher.
Talend named a Leader.
Kickstart your first data integration and ETL projects.
Watch the recorded webinar!
Pick up some tips and tricks with Context Variables
Learn how media organizations have achieved success with Data Integration
Look at6 ways to start utilizing Machine Learning with Amazon We Services and Talend