One Star

[resolved] Split a single row to multiple rows (CSV File).

Hy everyone!.
I'm using talend studio 3.2.0 with WinXP SP3. I have a csv file that has following fields in it.
Timestamp, param1,param2,param3,param1,param2,param3,param1,param2,param3
2009-08-15,1,2,3,4,5,6,7,8,9
Basically i want to seperate each record into 3 different records like this.
Timestamp,param1,param2,param3
2009-08-15,1,2,3
2009-08-15,4,5,6
2009-08-15,7,8,9
Can this be achieved using talend etl tool? if yes what components should be used. I'm entirely new to talend and considering it as an alternative to Pentaho kettle.
1 ACCEPTED SOLUTION

Accepted Solutions
Community Manager

Re: [resolved] Split a single row to multiple rows (CSV File).

Hello
Here is the demo:
in.csv:

Timestamp,param1,param2,param3,param1,param2,param3,param1,param2,param3
2009-08-15,1,2,3,4,5,6,7,8,9
2009-09-12,A,B,C,D,E,F,G,H,I

code on tJavaRow:
output_row.timestamp = input_row.timestamp;
output_row.conn= input_row.p1+","+input_row.p2+","+input_row.p3+"@"+input_row.p4+","+input_row.p5+","+input_row.p6+"@"+input_row.p7+","+input_row.p8+","+input_row.p9;

Result:
Starting job forum8666 at 16:31 28/10/2009.
connecting to socket on port 3947
connected
.----------+------+------+------.
| tLogRow_1 |
|=---------+------+------+-----=|
|timestamp |param1|param2|param3|
|=---------+------+------+-----=|
|2009-08-15|1 |2 |3 |
|2009-08-15|4 |5 |6 |
|2009-08-15|7 |8 |9 |
|2009-09-12|A |B |C |
|2009-09-12|D |E |F |
|2009-09-12|G |H |I |
'----------+------+------+------'
disconnected
Job forum8666 ended at 16:31 28/10/2009.

Best regards

shong
----------------------------------------------------------
Talend | Data Agility for Modern Business
5 REPLIES
Community Manager

Re: [resolved] Split a single row to multiple rows (CSV File).

Hello guy
Yes, it is easy to finish it with talend etl tool. I will show you a demo soon...
Best regards

shong
----------------------------------------------------------
Talend | Data Agility for Modern Business
Community Manager

Re: [resolved] Split a single row to multiple rows (CSV File).

Hello
Here is the demo:
in.csv:

Timestamp,param1,param2,param3,param1,param2,param3,param1,param2,param3
2009-08-15,1,2,3,4,5,6,7,8,9
2009-09-12,A,B,C,D,E,F,G,H,I

code on tJavaRow:
output_row.timestamp = input_row.timestamp;
output_row.conn= input_row.p1+","+input_row.p2+","+input_row.p3+"@"+input_row.p4+","+input_row.p5+","+input_row.p6+"@"+input_row.p7+","+input_row.p8+","+input_row.p9;

Result:
Starting job forum8666 at 16:31 28/10/2009.
connecting to socket on port 3947
connected
.----------+------+------+------.
| tLogRow_1 |
|=---------+------+------+-----=|
|timestamp |param1|param2|param3|
|=---------+------+------+-----=|
|2009-08-15|1 |2 |3 |
|2009-08-15|4 |5 |6 |
|2009-08-15|7 |8 |9 |
|2009-09-12|A |B |C |
|2009-09-12|D |E |F |
|2009-09-12|G |H |I |
'----------+------+------+------'
disconnected
Job forum8666 ended at 16:31 28/10/2009.

Best regards

shong
----------------------------------------------------------
Talend | Data Agility for Modern Business
One Star

Re: [resolved] Split a single row to multiple rows (CSV File).

hey that's freakin GREAT man, I liked the '@' delimeter approach you have used.
Thanks for taking out time and mentioning it in so much detail. This tool really provides endless possibilities in the ETL domain. One just needs to use the building blocks efficiently.
Best Regards,
Umar
One Star

Re: [resolved] Split a single row to multiple rows (CSV File).

The other option (which I used for a file) is to create three separate outputs with a tMap and then merge them with a tUnite. Doesn't require hand coding.
Seven Stars

Re: [resolved] Split a single row to multiple rows (CSV File).

Rather than using tUnite, you can unite them directly in the tMap: when creating the second and third outputs within tMap just select the "Create join table from" option rather than the default "New output" option. There will then be only one flow out of the tMap but it will include the rows from both of the output tables within tMap.
However, from v4.2.0, the tSplitRow component handles the original requirement very well.