One Star

Split input file into multiple outputs

Hi
I am a newbie and have been unable to find the answer to my problem on the forum. Any help would be appreciated.
The scenario I need to resolve is as follows:
Input file is a csv file which contains a batch header record followed by 'n' number of rows. The last record in the file is a simple batch trailer record. I need to split this single file into individual output files based on the content of the data in the rows including a dynamic file name, and recreate the new batch header and trailer records for each of the new files. The number of files created would vary based on the content of the original input file.
eg
input.txt
BH,all animals
mammal,cow
reptile,cobra
mammal,horse
bird,sparrow
reptile,crocodile
BT,5
Output for the above example assuming the split is on the first field would be:
mammal.txt
BH,mammal
cow
horse
BT2
reptile.txt
BH,reptile
cobra
crocodile
BT,2
bird.txt
BH,bird
sparrow
BT,1

Any assistance in how to structure the JAVA job would be much appreciated.
Regards
Andre
10 REPLIES
Community Manager

Re: Split input file into multiple outputs

Hello
I think it is difficult to create a file with the format as you said at one time. The line 'BH,mammal' should be written manually.
Best regards
shong
----------------------------------------------------------
Talend | Data Agility for Modern Business

Re: Split input file into multiple outputs

Hi,
We have some requirement, where file format is as given below:
H,name,add1,add2(Header data - common to all records)
C,xyz,1242654212,23.00 (details about payments)
M,Refencedata1,client data1(Child record of its previous records)
C,mno,124231987,874.00 (details about payments)
C,pqr,1242312123,45343.00 (details about payments)
M,Refencedata2,client data2(Child record of its previous records)
T, 3(final record, the trailer)
In the above file Header record data is common for all records, All main records may contain child records also and trailer record contains the count of total records excluding Header and Trailer record. I need to insert this data into a temporary table.
Please suggest me how to map this file to a single table.
Thanks....
Community Manager

Re: Split input file into multiple outputs

Hello guy
What are your expected result? Are there some rule in your data?
Best regards
shong
----------------------------------------------------------
Talend | Data Agility for Modern Business

Re: Split input file into multiple outputs

Hi Shong,
Thanks for your response.
Here is our requirement.
File will be in the given format:
H,name,add1,add2(Header data - common to all records)
C,xyz,1242654212,23.00 (details about payments)
M,Refencedata1,client data1(Child record of its previous records)
C,mno,124231987,874.00 (details about payments)
C,pqr,1242312123,45343.00 (details about payments)
M,Refencedata2,client data2(Child record of its previous records)
T, 3(final record, the trailer)
I have to parse it and insert into a temporary table (Ex: TempTable);
Each row will have Header Data,Payments Data, Child record data
Ex: name, add1, add2, xyz,1242654212,23.00, Refencedata1,client data1
name, add1, add2,mno,124231987,874.00
name, add1, add2,pqr,1242312123,45343.00,Refencedata2,client data2
I can do the data clean up etc. once I get the data in the temp table. So, I willn't apply any rules here. I am expecting the details like how to map this hierarchical file fields to table columns.
Thanks...
Hi Shong, any update on this; This is an urgent requirement as It would be great if you can help us on this.

Re: Split input file into multiple outputs

Hi Shong,
Any update on this; It would be great if you help us on this as this is an urgent requirement, .
Thanks,
Ashok
One Star

Re: Split input file into multiple outputs

how to convert from string to double or long datatypes,plz help me
One Star

Re: Split input file into multiple outputs

Hi srinikpisoft,
please open a new thread if you have new question. This will also increase the chance to get an answer ;-)
To convert your string you should use Double.parseDouble() or Long.parseLong().
Bye
Volker
One Star

Re: Split input file into multiple outputs

Hi techinfo.forum80,
hope my answer in this thread will help you out: 1473
Bye
Volker

Re: Split input file into multiple outputs

Hi Volker,
Thanks a lot for your response.
I have few questions:
From the above file I have taken out Header and Trailer records and passed only remaining records (Payment records and its child records).
How can we map it to the table:
Ex: here is my file:
H,name,add1,add2(Header data - common to all records)
C,xyz,1242654212,23.00 (details about payments)
M,Refencedata1,client data1(Child record of its previous records)
C,mno,124231987,874.00 (details about payments)
C,pqr,1242312123,45343.00 (details about payments)
M,Refencedata2,client data2(Child record of its previous records)
T, 3(final record, the trailer)
After removing the header and footer, it will be:
C,xyz,1242654212,23.00 (details about payments)
M,Refencedata1,client data1(Child record of its previous records)
C,mno,124231987,874.00 (details about payments)
C,pqr,1242312123,45343.00 (details about payments)
M,Refencedata2,client data2(Child record of its previous records)
I have to parse it and insert into a temporary table (Ex: TempTable);
Each row Payments Data and its Child record data
Ex: xyz,1242654212,23.00, Refencedata1,client data1
mno,124231987,874.00
pqr,1242312123,45343.00,Refencedata2,client data2
How can we implement this with talend components. I am expecting the information like how to map this with table column (either tMap or any other component)?
Hope you got my question.
Thanks,
Ashok.
One Star

Re: Split input file into multiple outputs

Hi Ashok,
do you have always one detail with a predefined mapping?
In this case you have two solutions (otherwise only the second one).
Solution one with "information transfer between two flows":
a)Read your file with tFileInputRegex ("^(.),(.*)$"). You now have two values: The row type and the data.
b) Split the stream in a tMap depending on the row type. The "C"-row must by the first in order.
c) In each output stream decompose data with tExtractDelimitedFields
d) In the flow of row "C" add a tJavaRow and set predefined context variables for your data you need in flow "M". (context.accountNumber= input_row.accountNumber for example)
e) In the flow of row "M" add a tJavaRow and add the values you need to the output (output_row.accountNumber= context.accountNumber). You must define the variables in the output schema (which will have mor than input). In this case you could ignore the warning appearing on tJavaRow.

Bye
Volker