One Star

Create multiple output files based on a unknown number or keys

I am trying to split one flow into multiple outputs based on a key. there are an unknown number of keys and every row with the same key is going to be outputted to a tFileOutputDelimited corresponding to that key. Any thoughts on how to go about doing this?
13 REPLIES
One Star

Re: Create multiple output files based on a unknown number or keys

Hi,
1) Calculate the total number of distinct records in the input data (distinct key value)
2) Store this value in the variable
3) Use Orchestration component like tFlowIterate, tLoop for looping mechanism
4) As the number of input records are dynamic, use the variable value (stored in 2nd step) for number of Iteration
5) Use Input component --> Filter --> tFileOutputDelimeted
6) Use parameter in the filter & tFileOutputDelimeted for storing different key value in the corresponding file
Best Regards,
Mayur
One Star

Re: Create multiple output files based on a unknown number or keys

Thanks for the response,
I think I understand the concept you put forward, but I'm not sure how to implement it.
1) how do I store the value in a variable from the flow?
2) How is tFlowToIterate used?
I'm new to Talend. Thanks for your help!
--
Jeff
One Star

Re: Create multiple output files based on a unknown number or keys

Hi Jeff,
For your 1st question:
# You can use tUniqueRow Component to identify the Unique records from input
# Pass these values to tMap component
# In tMap component create one variable (v_RowCount)with Integer as data type and increase its value by one for each iteration
# Once the flow is completed the variable in the tMap (v_RowCount) will consist of number of unique records in the file
# You can use this variable in your further logic
For your 2nd question: For help in any component use the following steps
# Drag & drop this component (any) in the work area
# Simply click F1
# A link will be visible in the Help window
# Click on that click, it will take you to that particular help section which consist of that component specific information
# Below that details it also consist of the case studies which will be helpful to understand how to implement any job by using this component
Try out these things and let me know in case you face any issues.
Best Regards,
Mayur

Re: Create multiple output files based on a unknown number or keys

One little trick that might be helpful here:
almost all components will update a globalMap key <component_name>_NB_LINE with the number of rows the component processes. You can retrieve this value with a call like this (substituting your component name of course):
(Integer)globalMap.get("tOracleInput_1_NB_LINE")

This can be very useful when you want to retrieve the number of rows that has gone through any component.
One Star

Re: Create multiple output files based on a unknown number or keys

Hi,
I've written a new tutorial on "how to split a file into many files regarding a key on each record" which explains how to solve this kind of task. It is actually only available in french. The tutorial shows 3 different technics to achieve this task.
Hope it can be useful.
One Star

Re: Create multiple output files based on a unknown number or keys

Here is how to write a file for each row using a unique key.
0) Create or have a existing unique key on each row
1) Read the file to prime the key in a loop(tFLowToIterate)
2) On the second read imbedded in the iterator filter on the iterator key
3) Change the name of the file to use the iterator key and current date time stamp

One Star

Re: Create multiple output files based on a unknown number or keys

Hi,
I would like to generate multiple files based on year and all these files has to store in their corresponding year folders automatically.
Plz find below screen shots for my requirement:
Moderator

Re: Create multiple output files based on a unknown number or keys

Hi ashajyothi.ece,
Can you upload again the screenshots you wanted to show, please? For some reason it didn't make it to your post.
Best regards
Sabrina
--
Don't forget to give kudos when a reply is helpful and click Accept the solution when you think you're good with it.
Fifteen Stars

Re: Create multiple output files based on a unknown number or keys

I've written a tutorial that covers this requirement I think....
https://www.rilhia.com/tutorials/load-data-dynamic-number-files
Rilhia Solutions
One Star

Re: Create multiple output files based on a unknown number or keys

Hi Sabrina,
Thank you for your quick responseSmiley Happy
As I am unable to upload screen shots am providing my requirement:
Actually I have a huge CSV file. It has a date column. My data is from 2000 to 2016 year. 
All dates are in this format--"DD-MM-YYYY".
I would like to store each year data in a separate csv file. For example all 2016 data has to store automatically in separate csv file and 2015 data in a separate csv file in the below path 
(E drive/output/year/2016)
(E drive/output/year/2015)..............so on....
I hope you understand my requirement. It would be very helpful if you provide job design with screen shots. As i am very new to talend.
Kindly let me know if you have any concerns.
Thanks in advance.
Kind regards,
Asha
One Star

Re: Create multiple output files based on a unknown number or keys

Hi,
Can any one explain how to update records in  csv file?
My requirement is :
I have two csv file with same columns id, name.
(old.csv) id name
             1  Asha
             2  Jyothi
(New.csv) id name
               1 Jyothi
               2 Raj
               3 Vinay
So I need to update the new records for id 1 and 2 and insert 3rd record.
Please let me know if any concerns.
Kind Regards,
Asha
One Star

Re: Create multiple output files based on a unknown number or keys

I also have same scenario where i need to write data into files as per the data value of some column like 
(E drive/output/year/2016)
(E drive/output/year/2015)
where i am getting  2015 and 2016 in my data as a column value .
I am able to achieve this with standard job .
But I am facing the problem  doing this by  Bigdata Batch job (Spark ).
or suggest any other optimize way using spark batch job 
One Star

Re: Create multiple output files based on a unknown number or keys

I also have same scenario where i need to write data into files as per the data value of some column like 
(E drive/output/year/2016)
(E drive/output/year/2015)
where i am getting  2015 and 2016 in my data as a column value .
I am able to achieve this with standard job .
But I am facing the problem  doing this by  Bigdata Batch job (Spark ).
or suggest any other optimize way using spark batch job 

I achived this by writing custom java code by saving RDD using multiple file format .
You can also use dataframes partition by save method 
Br 
Anuj