How to execute on every 100 rows of data from DB?

Six Stars

How to execute on every 100 rows of data from DB?

Hi,

 

I am pretty sure talend should be able to do this task relatively easily, but I am not sure the best way to go about it. I have 100,000 rows of data, but an tSOAP I am calling can only take 100 rows of data per tSOAP call. I would like to execute an tSOAP call on 100 rows each time until I have looped through the full 100,000 row data set. Any advice/recommended components on going about this is much appreciated. 

 

Thanks,

Pratap.


Accepted Solutions
Highlighted
Nine Stars JR
Nine Stars

Re: How to execute on every 100 rows of data from DB?

Hi

Not really sure what you have as an input but here I will assume it is a csv file. Set your job up like this (assuming you have a header in your input file):

tFileInputDelimited (pointing at your file just to count the records - no outputs) 
-- OnSubjobOK -->
tLoop (From: 0 To: ((Integer)globalMap.get("tFileInputDelimited_1_NB_LINE")), Step: 100)
-- Iterate --> 
tFileInputDelimited (pointing at your file again but this time with Header: 1 + ((Integer)globalMap.get("tLoop_1_CURRENT_VALUE")) and Limit: 100)
-- Main --> 
tSOAP (or whatever you want to do after this)

This will split up your input into batches of 100. If you are using a different kind of input (e.g. an Oracle table) the counting of the records and querying the table will look different of course.

 

Let us know if this helps.

Nine Stars JR
Nine Stars

Re: How to execute on every 100 rows of data from DB?

Now I get what you mean with duplicates. You will have to look into the calculations to make sure that a single record is not processed in another batch again. I do not know if your file has any headers or footers in it. Please try to understand the calculations done here and try to adapt them to your needs. Unfortunately, I cannot present you with a perfect solution here.

Six Stars

Re: How to execute on every 100 rows of data from DB?

Yes, Exactly, its working fine now.ThanksSmiley Happy
Nine Stars JR
Nine Stars

Re: How to execute on every 100 rows of data from DB?

For Oracle DBs, you would have to get the row count first by using a tDBRow component and performing a "SELECT COUNT(*) FROM XXX WHERE YYY". Use a tFlowToIterate component after that and store the only result record (there can be only one) via a tSetGlobalVar in a global variable.

Instead of the second tFileInputXXX component, you would use a tDBInput component and for Oracle, you would use a statement like the following:

 

SELECT A, B, C, D FROM XXX WHERE YYY AND ROWNUM > " + String.valueOf(((Integer)globalMap.get("tLoop_1_CURRENT_VALUE")))) + " AND ROWNUM < " + String.valueOf(((Integer)globalMap.get("tLoop_1_CURRENT_VALUE"))) +100)

 

I think PostgreSQL and MySQL (and others) do not have the ROWNUM pseudo-column - you will have to find your own way here...


All Replies
Six Stars

Re: How to execute on every 100 rows of data from DB?

Any Update please
Highlighted
Nine Stars JR
Nine Stars

Re: How to execute on every 100 rows of data from DB?

Hi

Not really sure what you have as an input but here I will assume it is a csv file. Set your job up like this (assuming you have a header in your input file):

tFileInputDelimited (pointing at your file just to count the records - no outputs) 
-- OnSubjobOK -->
tLoop (From: 0 To: ((Integer)globalMap.get("tFileInputDelimited_1_NB_LINE")), Step: 100)
-- Iterate --> 
tFileInputDelimited (pointing at your file again but this time with Header: 1 + ((Integer)globalMap.get("tLoop_1_CURRENT_VALUE")) and Limit: 100)
-- Main --> 
tSOAP (or whatever you want to do after this)

This will split up your input into batches of 100. If you are using a different kind of input (e.g. an Oracle table) the counting of the records and querying the table will look different of course.

 

Let us know if this helps.

Six Stars

Re: How to execute on every 100 rows of data from DB?

ThanksSmiley Happy
It seems like, working but i have 220 custoemrs in .csv file, but when every do the iteration with 100, then 300 customers are processed, i hope, duplicate records also processing? How we can fix this issue
Nine Stars JR
Nine Stars

Re: How to execute on every 100 rows of data from DB?

Hi

Have you tried it? If you set the limit to 100 and there are only 20 records left from where you start (by setting the header value), it will only process those 20 records. I do not really know what you mean with duplicates.

Six Stars

Re: How to execute on every 100 rows of data from DB?

Yes, i tried, but getting some of customer coming again into flow ,
Below implementation where i did,
tFileInputDelimited (pointing at your file just to count the records - no outputs) - Just file location given, i didn't given any header and limit here, is it do any thing other then file
-- OnSubjobOK -->
tLoop (From: 0 To: ((Integer)globalMap.get("tFileInputDelimited_1_NB_LINE")), Step: 100)
-- Iterate -->
tFileInputDelimited (pointing at your file again but this time with Header: 1 + ((Integer)globalMap.get("tLoop_1_CURRENT_VALUE")) and Limit: 100)- same i did, i didn't done anything other then above
-- Main -->
tSOAP (or whatever you want to do after this)-used
Nine Stars JR
Nine Stars

Re: How to execute on every 100 rows of data from DB?

Now I get what you mean with duplicates. You will have to look into the calculations to make sure that a single record is not processed in another batch again. I do not know if your file has any headers or footers in it. Please try to understand the calculations done here and try to adapt them to your needs. Unfortunately, I cannot present you with a perfect solution here.

Six Stars

Re: How to execute on every 100 rows of data from DB?

here what is happening, i have file with values
123451
123456
124578
124574
125485
till 1000 rows
Ourput: 123451,123456,124578,124574,125485.. 100rows
123456,124578,124574,125485.. 100rows
124578,124574,125485.. 100rows
124574,125485.. 100rows
like that displaying, next row start with previous row 2nd id/customer, so its sending duplicate ids
Suggest if have any solution.
Nine Stars JR
Nine Stars

Re: How to execute on every 100 rows of data from DB?

There is something wrong with the "header" value in the second file component. Looking at this, I would assume that you have chosen ((Integer)globalMap.get("tLoop_1_CURRENT_ITERATION"))) instead of ((Integer)globalMap.get("tLoop_1_CURRENT_VALUE"))). Correct?

Six Stars

Re: How to execute on every 100 rows of data from DB?

Yes, Exactly, its working fine now.ThanksSmiley Happy
Six Stars

Re: How to execute on every 100 rows of data from DB?

Could you please let us know that how we can loop from DB instead of tFileInputDelimited (pointing at your file just to count the records)
Six Stars

Re: How to execute on every 100 rows of data from DB?

I have a similar use case, except I need to execute (or Chunk) every 100 rows from an Oracle DB to a tSOAP call.

 

Thanks

Nine Stars JR
Nine Stars

Re: How to execute on every 100 rows of data from DB?

For Oracle DBs, you would have to get the row count first by using a tDBRow component and performing a "SELECT COUNT(*) FROM XXX WHERE YYY". Use a tFlowToIterate component after that and store the only result record (there can be only one) via a tSetGlobalVar in a global variable.

Instead of the second tFileInputXXX component, you would use a tDBInput component and for Oracle, you would use a statement like the following:

 

SELECT A, B, C, D FROM XXX WHERE YYY AND ROWNUM > " + String.valueOf(((Integer)globalMap.get("tLoop_1_CURRENT_VALUE")))) + " AND ROWNUM < " + String.valueOf(((Integer)globalMap.get("tLoop_1_CURRENT_VALUE"))) +100)

 

I think PostgreSQL and MySQL (and others) do not have the ROWNUM pseudo-column - you will have to find your own way here...

Calling Talend Open Studio Users

The first 100 community members completing the Open Studio survey win a $10 gift voucher.

Start the survey

2019 GARNER MAGIC QUADRANT FOR DATA INTEGRATION TOOL

Talend named a Leader.

Get your copy

OPEN STUDIO FOR DATA INTEGRATION

Kickstart your first data integration and ETL projects.

Download now

What’s New for Talend Summer ’19

Watch the recorded webinar!

Watch Now

Best Practices for Using Context Variables with Talend – Part 4

Pick up some tips and tricks with Context Variables

Blog

How Media Organizations Achieved Success with Data Integration

Learn how media organizations have achieved success with Data Integration

Read

6 Ways to Start Utilizing Machine Learning with Amazon We Services and Talend

Look at6 ways to start utilizing Machine Learning with Amazon We Services and Talend

Blog