How to execute on every 100 rows of data?

One Star

How to execute on every 100 rows of data?

Hey,
I am pretty sure talend should be able to do this task relatively easily, but I am not sure the best way to go about it.
I have 100,000 rows of data, but an API I am calling can only take 100 rows of data per API call.
I would like to execute an API call on 100 rows each time until I have looped through the full 100,000 row data set.
Any advice/recommended components on going about this is much appreciated.
Thanks,
Brian
Seven Stars

Re: How to execute on every 100 rows of data?

How are you constructing the API call and passing the data - in tJavaRow?
One Star

Re: How to execute on every 100 rows of data?

I am using the tSoap component to perform the API call.
I am thinking use tFlowToIterate --- something that counts to 100 --- tIterateToFlow (some how batch it into groups of 100) and then execute off each batch
Seven Stars

Re: How to execute on every 100 rows of data?

I think the main problem is how to aggregate the 100 rows of data to pass in one tSOAP. If you can do that, then calling the tSOAP only for every 100th row is quite easy e.g.
(construct your SOAP call aggregating the 100 rows of data) --> tFilterRow advanced: Numeric.sequence("s1",1,1)%100==0 --> tSOAP --> (reset your SOAP call construct to start with the next set of 100 rows)
Don't forget OnSubjobOK --> tSOAP for the remaining data rows in excess of a multiple of 100.
One Star

Re: How to execute on every 100 rows of data?

Yeah, I am trying to figure out how to batch the data every 100 rows, but I like the idea of calling tSoap every 100th row.
One Star

Re: How to execute on every 100 rows of data?

I am thinking about creating a temp table or temp file and number the rows. Then use tFilter and only allows rows with numbers 100 or less to be sent, the rest get sent back to the temp file renumbered.
Loop until the temp file/table is empty.
One Star

Re: How to execute on every 100 rows of data?

Hi All, Just wondered if this ever got resolved as I am facing exactly the same problem. Thanks, Ash.
Seventeen Stars

Re: How to execute on every 100 rows of data?

I would use a tMap with a filter and a sequence. Ever if the sequence value is a integer multiplier of 100 the filter opens. 
One Star

Re: How to execute on every 100 rows of data?

@jlolling, thanks, for getting back to me so fast!
Sorry to be a pain but would you mind elaborating a little more, im still fairly new to the tool. So inside the tMap, have a filter in there?
Seventeen Stars

Re: How to execute on every 100 rows of data?

please read documentation about Talend & filter in tmap
https://help.talend.com/search/all?query=tMap+operation&content-lang=en
regards
laurent
Seventeen Stars

Re: How to execute on every 100 rows of data?

In the tMap use this expression as filter
(Numeric.sequence("index",1,1) % 100) == 0

If you need to filter also on other places in your job (even child or parent jobs) take care you give the sequence different names because they are static.
One Star

Re: How to execute on every 100 rows of data?

Ahh i see what youre saying, but Im not sure this solves my problem. I can only get it to process every 100th record alone, not all in between?
I'm trying to create an XML document to be processed when its 100 element large, once it hits that send the data off to the external service then create the next document with another 100 rows in until there arent anymore rows to process. Is there a way to achieve this?
Four Stars

Re: How to execute on every 100 rows of data?

I have an idea, please check if you can implement this.
- Create a sequence (rowID) for incoming rows using sequence generator
- Create a filter to filter records having rowID > 100 and write it to file
- Create subjob and process the file like first step above
- Once the file is processed, move the file and push to archive folder
- and repeat the process again till you have a file
Thanks
Vaibhav
One Star

Re: How to execute on every 100 rows of data?

@Vaibhav, thanks for the response. I was really trying / hoping to stay away from saving files (as i know there is a split functionality in tAdvancedFileOutputXML), do you know if there are any other ways to keep it all in short term memory without saving to disk?
Four Stars

Re: How to execute on every 100 rows of data?

Then, Why not to use buffer/hash components?
One Star

Re: How to execute on every 100 rows of data?

Okay, i could try. Ive only used these (tBufferOutput) before for passing data up to parent jobs though, how would i use this to pass data at the right time within the same job?
Four Stars

Re: How to execute on every 100 rows of data?

One Star

Re: How to execute on every 100 rows of data?

I have had a look here, I have tried something similar but couldnt get it to work correctly. I have already pulled all of the data out of the database, I want to loop through each of those records and on record 100 do something. Id rather not make a call to the DB every time I loop. Im really struggling to understand how to do this in Talend without it being really complicated :-( 
Community Manager

Re: How to execute on every 100 rows of data?

Hi 
Take a look at this topic, hope it could give you a hint.
Best regards
Shong
----------------------------------------------------------
Talend | Data Agility for Modern Business
Four Stars

Re: How to execute on every 100 rows of data?

Before starting to create a small poc for you, can you pl show your existing job design?
One Star

Re: How to execute on every 100 rows of data?

Thanks @Shong, will have a look now.
@sanvaibhav, thanks for taking the time to do this, below is a section from the job I have.

One Star

Re: How to execute on every 100 rows of data?

@Shong, I have been looking into your method in more detail, a potential problem could be the dataset changing during runtime, I would then need to add another layer of logic to prevent the missing / duplication of some records. It seems like over kill when i can get the dataset in one statement and should be able to process later, what are your thoughts? Is there no way i can pull down a full set of data and do this?
Community Manager

Re: How to execute on every 100 rows of data?

Hi AshWhitear
If you are not sure how many total records it has in the database, you can select the total number of records in a DB input component and store this value to a context variable or global variable, this variable will be used in the 'To' field of tLoop component.
Best regards
Shong
----------------------------------------------------------
Talend | Data Agility for Modern Business
One Star

Re: How to execute on every 100 rows of data?

Hi @Shong, thanks for this, I think its almost working now, just a quick one about what youve said above, how would I store that much data inside a context var or global var? What datatype would I use and how would I get all the data into that one var?
Ive tried using tHashInput and Output but upon clearing the object seems to be removed and I get an error saying the Hash isnt initialised after the first loop, if I could store this data in a context of global variable it should work perfectly!
Thanks, Ash.
Community Manager

Re: How to execute on every 100 rows of data?

Hi
 how would I store that much data inside a context var or global var? What datatype would I use and how would I get all the data into that one var?

This variable stores the total number of records, not the data. Define a context variable with int/Integer type. For example:
tMssqlInput--main--tJavaRow
In tMssqlInput, select the total number of records:
"select count(*) as nb_line from tableName"
In tJavaRow:
context.to=input_row.nb_line
Best regards
Shong
----------------------------------------------------------
Talend | Data Agility for Modern Business