tFileOutputDelimited

Highlighted
Four Stars

tFileOutputDelimited

Dear community,

 

I have the following scenario: I want to output a lot of rows (couple of millions) to multiple CSV files (chunks) using tFileOutputDelimited as I do not want to have just one big file. Those files have to be compressed with gzip and uploaded to the cloud. As disk space is limited I cannot simply export everything and then do the compression and upload but it must be done for each chunk before exporting the next chunk. 

 

There are two options to get that done (Option 2 is currently working but can not be used in this cases unfortunately):

option 1) Using the split option in the tFileOutputDelimited and define how much rows can be wrote before creating a new chunk. 

option 2) Make a loop and limit the number of rows for each iteration for the export and send it to tFileOutputDelimited (just like in this community post) instead of getting all rows at a time like in option 1

 

The second option is of the table, because it is already in use and does not work for a specific input. So I want to use option 1. The problem I am facing here are two:

problem 1) How do I "interrupt" the exporting workflow, each time a new file was created? The idea is, that after one CSV file is full, I compress it and then upload it. Only if these steps are done we continue with the workflow and the next chunk.

As an indicator if the file was written I thougt I could use NB_LINE in an tJavaRow component but it never gives the correct value.

// inside tJavaRow component
int exportedLines = (Integer)globalMap.getOrDefault("tFileOutputDelimited_1_NB_LINE", -1);
logger.info(exportedLines); // this value is always -1

problem 2) How do I get the filename of each chunk per iteration. If I use (String)globalMap.getOrDefault("tFileOutputDelimited_1_FILE_NAME", "") I do not get the current correct filename (e.g. "output1.csv") but only the base name (e.g. "output.csv") which is useless of course. 

 

To give a simple example I created a job (screenshots and also output are attached). 

I generated 20 lines and split them after 10 rows (context.splitValue = 10)

 

csv_split_01_rowGenerator.PNGcsv_split_02_tfileoutputdelimited.PNGcsv_split_03_tfileoutputdelimited.PNGcsv_split_04_tJavaRow.PNG

 

Starting job csv_split at 10:57 09/07/2019.

[INFO ]: 2019-07-09 10:57:47.289: exampleproject.csv_split_0_1.csv_split - TalendJob: 'csv_split' - Start.
[INFO ]: 2019-07-09 10:57:47.312: exampleproject.csv_split_0_1.csv_split - tRowGenerator_1 - Generating records.
[INFO ]: 2019-07-09 10:57:47.324: exampleproject.csv_split_0_1.csv_split - COkHZN6rQbfSKQkiyDpc
[INFO ]: 2019-07-09 10:57:47.324: exampleproject.csv_split_0_1.csv_split - C:/Users/user/Downloads/talend_output/csv_export/output_parts_
[INFO ]: 2019-07-09 10:57:47.324: exampleproject.csv_split_0_1.csv_split - -1
[INFO ]: 2019-07-09 10:57:47.325: exampleproject.csv_split_0_1.csv_split - voqTGAveWNE1vPmk27IG
[INFO ]: 2019-07-09 10:57:47.325: exampleproject.csv_split_0_1.csv_split - C:/Users/user/Downloads/talend_output/csv_export/output_parts_
[INFO ]: 2019-07-09 10:57:47.325: exampleproject.csv_split_0_1.csv_split - -1
[INFO ]: 2019-07-09 10:57:47.325: exampleproject.csv_split_0_1.csv_split - Uro6V1mMc9GGU2zNbKoW
[INFO ]: 2019-07-09 10:57:47.325: exampleproject.csv_split_0_1.csv_split - C:/Users/user/Downloads/talend_output/csv_export/output_parts_
[INFO ]: 2019-07-09 10:57:47.325: exampleproject.csv_split_0_1.csv_split - -1
[INFO ]: 2019-07-09 10:57:47.325: exampleproject.csv_split_0_1.csv_split - 9zLYTweojfujJBQHIhwE
[INFO ]: 2019-07-09 10:57:47.325: exampleproject.csv_split_0_1.csv_split - C:/Users/user/Downloads/talend_output/csv_export/output_parts_
[INFO ]: 2019-07-09 10:57:47.325: exampleproject.csv_split_0_1.csv_split - -1
[INFO ]: 2019-07-09 10:57:47.325: exampleproject.csv_split_0_1.csv_split - X7vWPFtsElvzYGGqUbML
[INFO ]: 2019-07-09 10:57:47.326: exampleproject.csv_split_0_1.csv_split - C:/Users/user/Downloads/talend_output/csv_export/output_parts_
[INFO ]: 2019-07-09 10:57:47.326: exampleproject.csv_split_0_1.csv_split - -1
[INFO ]: 2019-07-09 10:57:47.326: exampleproject.csv_split_0_1.csv_split - cZY58I6DVpNozScCdq7N
[INFO ]: 2019-07-09 10:57:47.326: exampleproject.csv_split_0_1.csv_split - C:/Users/user/Downloads/talend_output/csv_export/output_parts_
[INFO ]: 2019-07-09 10:57:47.326: exampleproject.csv_split_0_1.csv_split - -1
[INFO ]: 2019-07-09 10:57:47.327: exampleproject.csv_split_0_1.csv_split - XmnIcEE1h8fQ1qb9oKZI
[INFO ]: 2019-07-09 10:57:47.327: exampleproject.csv_split_0_1.csv_split - C:/Users/user/Downloads/talend_output/csv_export/output_parts_
[INFO ]: 2019-07-09 10:57:47.327: exampleproject.csv_split_0_1.csv_split - -1
[INFO ]: 2019-07-09 10:57:47.327: exampleproject.csv_split_0_1.csv_split - BzniCF3NwmTKQNyBNyn0
[INFO ]: 2019-07-09 10:57:47.327: exampleproject.csv_split_0_1.csv_split - C:/Users/user/Downloads/talend_output/csv_export/output_parts_
[INFO ]: 2019-07-09 10:57:47.327: exampleproject.csv_split_0_1.csv_split - -1
[INFO ]: 2019-07-09 10:57:47.328: exampleproject.csv_split_0_1.csv_split - jyiQz42NEcmoogh1WQfh
[INFO ]: 2019-07-09 10:57:47.328: exampleproject.csv_split_0_1.csv_split - C:/Users/user/Downloads/talend_output/csv_export/output_parts_
[INFO ]: 2019-07-09 10:57:47.328: exampleproject.csv_split_0_1.csv_split - -1
[INFO ]: 2019-07-09 10:57:47.328: exampleproject.csv_split_0_1.csv_split - u8Kq2wZKdkTa8UN4do7Q
[INFO ]: 2019-07-09 10:57:47.328: exampleproject.csv_split_0_1.csv_split - C:/Users/user/Downloads/talend_output/csv_export/output_parts_
[INFO ]: 2019-07-09 10:57:47.328: exampleproject.csv_split_0_1.csv_split - -1
[INFO ]: 2019-07-09 10:57:47.330: exampleproject.csv_split_0_1.csv_split - KNkzAp84zgAFQ8J8kKcI
[INFO ]: 2019-07-09 10:57:47.330: exampleproject.csv_split_0_1.csv_split - C:/Users/user/Downloads/talend_output/csv_export/output_parts_
[INFO ]: 2019-07-09 10:57:47.330: exampleproject.csv_split_0_1.csv_split - -1
[INFO ]: 2019-07-09 10:57:47.330: exampleproject.csv_split_0_1.csv_split - j0NO25wzgipFH3Qw2Rqf
[INFO ]: 2019-07-09 10:57:47.330: exampleproject.csv_split_0_1.csv_split - C:/Users/user/Downloads/talend_output/csv_export/output_parts_
[INFO ]: 2019-07-09 10:57:47.330: exampleproject.csv_split_0_1.csv_split - -1
[INFO ]: 2019-07-09 10:57:47.330: exampleproject.csv_split_0_1.csv_split - KpUbSmUzdYdt02xrxw6f
[INFO ]: 2019-07-09 10:57:47.330: exampleproject.csv_split_0_1.csv_split - C:/Users/user/Downloads/talend_output/csv_export/output_parts_
[INFO ]: 2019-07-09 10:57:47.330: exampleproject.csv_split_0_1.csv_split - -1
[INFO ]: 2019-07-09 10:57:47.330: exampleproject.csv_split_0_1.csv_split - JIHp6bC12qgYXtb5uvXy
[INFO ]: 2019-07-09 10:57:47.330: exampleproject.csv_split_0_1.csv_split - C:/Users/user/Downloads/talend_output/csv_export/output_parts_
[INFO ]: 2019-07-09 10:57:47.330: exampleproject.csv_split_0_1.csv_split - -1
[INFO ]: 2019-07-09 10:57:47.330: exampleproject.csv_split_0_1.csv_split - sXKn3bzLMYRvTaqNRoM8
[INFO ]: 2019-07-09 10:57:47.330: exampleproject.csv_split_0_1.csv_split - C:/Users/user/Downloads/talend_output/csv_export/output_parts_
[INFO ]: 2019-07-09 10:57:47.330: exampleproject.csv_split_0_1.csv_split - -1
[INFO ]: 2019-07-09 10:57:47.331: exampleproject.csv_split_0_1.csv_split - E2khPzfD7fP8AqkDYmGB
[INFO ]: 2019-07-09 10:57:47.331: exampleproject.csv_split_0_1.csv_split - C:/Users/user/Downloads/talend_output/csv_export/output_parts_
[INFO ]: 2019-07-09 10:57:47.331: exampleproject.csv_split_0_1.csv_split - -1
[INFO ]: 2019-07-09 10:57:47.331: exampleproject.csv_split_0_1.csv_split - hRSvfDqUdXMTzAIGlTsD
[INFO ]: 2019-07-09 10:57:47.331: exampleproject.csv_split_0_1.csv_split - C:/Users/user/Downloads/talend_output/csv_export/output_parts_
[INFO ]: 2019-07-09 10:57:47.331: exampleproject.csv_split_0_1.csv_split - -1
[INFO ]: 2019-07-09 10:57:47.331: exampleproject.csv_split_0_1.csv_split - MTaXco4n07MBcLY91ufG
[INFO ]: 2019-07-09 10:57:47.331: exampleproject.csv_split_0_1.csv_split - C:/Users/user/Downloads/talend_output/csv_export/output_parts_
[INFO ]: 2019-07-09 10:57:47.331: exampleproject.csv_split_0_1.csv_split - -1
[INFO ]: 2019-07-09 10:57:47.331: exampleproject.csv_split_0_1.csv_split - fWwhK3ZdTt1vPbmUUuZ5
[INFO ]: 2019-07-09 10:57:47.331: exampleproject.csv_split_0_1.csv_split - C:/Users/user/Downloads/talend_output/csv_export/output_parts_
[INFO ]: 2019-07-09 10:57:47.333: exampleproject.csv_split_0_1.csv_split - -1
[INFO ]: 2019-07-09 10:57:47.333: exampleproject.csv_split_0_1.csv_split - d19amx7tuG3b7hofQz5L
[INFO ]: 2019-07-09 10:57:47.333: exampleproject.csv_split_0_1.csv_split - C:/Users/user/Downloads/talend_output/csv_export/output_parts_
[INFO ]: 2019-07-09 10:57:47.333: exampleproject.csv_split_0_1.csv_split - -1
[INFO ]: 2019-07-09 10:57:47.333: exampleproject.csv_split_0_1.csv_split - tRowGenerator_1 - Generated records count:20 .
30 milliseconds
[INFO ]: 2019-07-09 10:57:47.334: exampleproject.csv_split_0_1.csv_split - TalendJob: 'csv_split' - Done.

Job csv_split ended at 10:57 09/07/2019. [exit code=0]

Thanks in advance!

TT

Moderator

Re: tFileOutputDelimited

Hello,

If you want to get the filename of each chunk per iteration, here is a tflowtoiterate component which reads data line by line from the input flow and stores the data entries in iterative global variables.

For more information, please have a look at this component reference about:TalendHelpCenter:tFlowToIterate.

Best regards

Sabrina

--
Don't forget to give kudos when a reply is helpful and click Accept the solution when you think you're good with it.

2019 GARNER MAGIC QUADRANT FOR DATA INTEGRATION TOOL

Talend named a Leader.

Get your copy

OPEN STUDIO FOR DATA INTEGRATION

Kickstart your first data integration and ETL projects.

Download now

What’s New for Talend Summer ’19

Watch the recorded webinar!

Watch Now

Best Practices for Using Context Variables with Talend – Part 2

Part 2 of a series on Context Variables

Blog

Best Practices for Using Context Variables with Talend – Part 1

Learn how to do cool things with Context Variables

Blog

Best Practices for Using Context Variables with Talend – Part 3

Read about some useful Context Variable ideas

Blog