It seems the components process one row at a time.
I have a need to micro batch it in some cases and in some cases ensure that I am looking at the whole of data set.
How do I enforce batch size and/or enforce all rows to be processed in one pass.
e.g will be I have 2 millions rows to push to a webservice, I would rather do 10-50 K rows in one call, rather than making 2 Million unique web calls
In this case I need to be able to provide a batch size
or I need to calculate some aggregate value (for simplicity sake) In this case I need to know all rows before I can do the math
This page gives some pointers about that kind of implementation.
Hi level, the studio will define "groups" of a particualr size (assumed "big" from the component developper point of view). To implement chunking/bulking you need to define in your configuration a "maxSize" option and define the following callback (method in your processor):
This works for output kind of components but for transform components (understand a component with an output like the use case you mentionned) you would need a patched version of the studio since we added it after last available release.
Which version of the studio do you rely on?
We have prepared a patch that include the bulk processing feature to the latest Talend Open Studio 7.1.1M1 milestone release.
You will find a readme file with instructions to install the attached patch.
Download the latest milestone release: Talend Open Studio 7.1.1M1