How to parse an input round robin into multiple outputs

Four Stars

How to parse an input round robin into multiple outputs

I am trying to load an input file into Redshift and I want to split the file round robin before loading it into Redshift to make use of the computational power of multiple slices in my cluster. How do I split an input into n number of outputs in a round robin fashion using Talend?

 

Ex:

Input:

id     name

1      Jon

2      Anne

3      Cole

4      Zack

5      Ellen

 

Output:

Main1

1     Jon

4     Zack

 

Main2

2    Anne

5    Ellen

 

Main 3

3    Cole

Ten Stars

Re: How to parse an input round robin into multiple outputs

You can create three tMap outputs with the condition: rowX.id % 3 == 0
...1
...2
And send each output to a separate file
Four Stars

Re: How to parse an input round robin into multiple outputs

Thank you for the reply. I thought about doing that, but I actually need 6 outputs (I put down 3 in my question to simplify the problem). So with this method rowX.id % 3 = 0 and rowX.id % 2 = 0 and rowX.id % 6 = 0 when the id is divisible by 6. I can't think of a simple filter to be able to split it 6 ways.

Ten Stars

Re: How to parse an input round robin into multiple outputs

You can create six outputs and change the expression to mod 6.

Alternately, I think you can set a row limit on tFileDelimited, and it will split the file into chunks of that size. To get a consistent number of files, you'd need to get a record count and divide that by the number of files you want. I can't test right now, but I'd assume it would use the sort order of the data flow, so that wouldn't get you a round robin of IDs unless you added the modulo expression as a new column and then sorted by that (and secondarily by the id).

2019 GARNER MAGIC QUADRANT FOR DATA INTEGRATION TOOL

Talend named a Leader.

Get your copy

OPEN STUDIO FOR DATA INTEGRATION

Kickstart your first data integration and ETL projects.

Download now

What’s New for Talend Summer ’19

Watch the recorded webinar!

Watch Now

Best Practices for Using Context Variables with Talend – Part 4

Pick up some tips and tricks with Context Variables

Blog

How Media Organizations Achieved Success with Data Integration

Learn how media organizations have achieved success with Data Integration

Read

Downloads and Trials

Test drive Talend's enterprise products.

Downloads