Talend Architecture and Distributed Data Processing

One Star

Talend Architecture and Distributed Data Processing

Hi all,
I'm relatively new to talend (just playing with 2.0) but have experience with Sunopsis, PowerCenter and other ETL like tools.
Having read the documentation and watched the forums I'm still confused to the distributed nature of the talend architecture. I can see how I can design a job within the talend studio and either execute it interactively from the UI or export the job for distribution on a collection/grid of machines but I can't see any support for automatically distributing the data amongst machines.
On a grid of say 10 machines, with 50M records I would obviously like to split the source data into 10 batches of 5M each, distribute amongst the 10 machines and recombine into a target database/file. From what I can tell that splitting is manual.
Is there any existing, or planned, support for automatically distributing a job over N machines. Similarly is there any planned support for splitting a job over N processors/cores without first writing data to a staging area/file ?
Don't take the above as a criticism, I think talend is great - I'm just just checking out what it can do and which part of my toolbox I hang it. If talend doesn't currently support the above has anyone found a suitable workaround or solution that worked for them ?
cheers,
DIGuy
One Star

Re: Talend Architecture and Distributed Data Processing

1  What is the difference between list and list(object) operations in tAggregateRow?
One Star

Re: Talend Architecture and Distributed Data Processing

What is the difference between byte and byte[] variable types?