Four Stars

How to parse an input round robin into multiple outputs

I am trying to load an input file into Redshift and I want to split the file round robin before loading it into Redshift to make use of the computational power of multiple slices in my cluster. How do I split an input into n number of outputs in a round robin fashion using Talend?

 

Ex:

Input:

id     name

1      Jon

2      Anne

3      Cole

4      Zack

5      Ellen

 

Output:

Main1

1     Jon

4     Zack

 

Main2

2    Anne

5    Ellen

 

Main 3

3    Cole

3 REPLIES
Ten Stars

Re: How to parse an input round robin into multiple outputs

You can create three tMap outputs with the condition: rowX.id % 3 == 0
...1
...2
And send each output to a separate file
Four Stars

Re: How to parse an input round robin into multiple outputs

Thank you for the reply. I thought about doing that, but I actually need 6 outputs (I put down 3 in my question to simplify the problem). So with this method rowX.id % 3 = 0 and rowX.id % 2 = 0 and rowX.id % 6 = 0 when the id is divisible by 6. I can't think of a simple filter to be able to split it 6 ways.

Ten Stars

Re: How to parse an input round robin into multiple outputs

You can create six outputs and change the expression to mod 6.

Alternately, I think you can set a row limit on tFileDelimited, and it will split the file into chunks of that size. To get a consistent number of files, you'd need to get a record count and divide that by the number of files you want. I can't test right now, but I'd assume it would use the sort order of the data flow, so that wouldn't get you a round robin of IDs unless you added the modulo expression as a new column and then sorted by that (and secondarily by the id).