Four Stars

Process all rows in custom component

Hi,

It seems the components process one row at a time.

I have a need to micro batch it in some cases and in some cases ensure that I am looking at the whole of data set.

 

How do I enforce batch size and/or enforce all rows to be processed in one pass.

e.g will be I have 2 millions rows to push to a webservice, I would rather do 10-50 K rows in one call, rather than making 2 Million unique web calls

 In this case I need to be able to provide a batch size

or I need to calculate some aggregate value (for simplicity sake) In this case I need to know all rows before I can do the math

2 REPLIES
Employee

Re: Process all rows in custom component

Hi @bhupendra_patil,

 

This page gives some pointers about that kind of implementation.

 

Hi level, the studio will define "groups" of a particualr size (assumed "big" from the component developper point of view). To implement chunking/bulking you need to define in your configuration a "maxSize" option and define the following callback (method in your processor):

 

  1. BeforeGroup: reset a record buffer (list)
  2. ElementListener: test if the buffer size is >= maxSize and if so flush, if not bufferize current record
  3. AfterGroup: if the buffer is not empty then flush

 

This works for output kind of components but for transform components (understand a component with an output like the use case you mentionned) you would need a patched version of the studio since we added it after last available release.

 

Which version of the studio do you rely on?

 

Thanks,

Romain
Talend Component Kit Documentation: https://talend.github.io/component-runtime/
Employee

Re: Process all rows in custom component

Hello,

We have prepared a patch that include the bulk processing feature to the latest Talend Open Studio 7.1.1M1 milestone release.

 

You will find a readme file with instructions to install the attached patch.
Download the latest milestone release: Talend Open Studio 7.1.1M1

Documentation: https://github.com/Talend/component-runtime