How to Processing the rows in batch

Six Stars

How to Processing the rows in batch

Hi,

 

I have a requirement like around 2000 file names are stored in the MSSQL table. I want to read those files in batches like 500 rows and then process those files. After completion of that job, again pick the next 500 files and follow the same process till the last file.

How to achieve the same?

 

Thank You in Advance,

Sonali

 


Accepted Solutions
Highlighted
Five Stars

Re: How to Processing the rows in batch

Hi,

We can create the job on parent & child basis:

Parent job: In this we can store the context variable value. In this the tweak is we have initially created one file and store the context variable value as "0".

parent.JPG

 

Child job: In this, we can increment the value and on the basis of that we can filter the data. like in your case the job will fetch first 500 records and on next run it will take next 500 records.

At the end of the child job we are storing the seq1(in this example is 2) which is working as the context variable value for the next time.

 child.JPGchild1.JPGIn my example I had took the sequence of 2

Output will be:

1run.JPGFirst run2run.JPGsecond run

Please let me know if you need more clarification.

 

Regards,

Akash


All Replies
Employee

Re: How to Processing the rows in batch

Hi,

 

     You cannot take the data and split them as batches directly using Talend components. You would have to employ custom components in between to process them as multiple batches.

 

     The ideal way to process will be to do a single read from MSSQL database to fetch all the 2000 records in one go and add a tflowtoiterate component to process these files one after another. It will make sure that all your files are processed and at the same time you are doing a single database fetch instead of multiple calls.

 

      But if you would like to process 500 records in parallel, then it will become complex use case for a standard job  where you will have to employ tparallelize and other custom code components.

 

     A better idea will be to merge all the files (assuming the schema are same) and process using Bigdata Spark jobs which will be quite fast. 

 

     I assume your use case is simple. So try with option 1 and if its taking time, go for Bigdata job options.

 

Please mark the topic as resolved if the answer has helped you. Kudos are also welcome :-)

 

Warm Regards,

 

Nikhil Thampi

 

 

 

 

Six Stars

Re: How to Processing the rows in batch

Hi Nikhil,
current job is like read all the files and then run another job(trunJob) for transformation. According to your solution,
tSQLInput->tflowToIterate-> trunjob

but in this flow how can I define the condition like 500 files in one go?
My question is for 500 files, trunjob should run once and for next 500 files it has to run again.

Thank You,
Sonali Jagtap
Highlighted
Five Stars

Re: How to Processing the rows in batch

Hi,

We can create the job on parent & child basis:

Parent job: In this we can store the context variable value. In this the tweak is we have initially created one file and store the context variable value as "0".

parent.JPG

 

Child job: In this, we can increment the value and on the basis of that we can filter the data. like in your case the job will fetch first 500 records and on next run it will take next 500 records.

At the end of the child job we are storing the seq1(in this example is 2) which is working as the context variable value for the next time.

 child.JPGchild1.JPGIn my example I had took the sequence of 2

Output will be:

1run.JPGFirst run2run.JPGsecond run

Please let me know if you need more clarification.

 

Regards,

Akash

Six Stars

Re: How to Processing the rows in batch

It worked for me.

 

Thank You so much Akash.

 

What’s New for Talend Summer ’19

Watch the recorded webinar!

Watch Now

Best Practices for Using Context Variables with Talend – Part 4

Pick up some tips and tricks with Context Variables

Blog

How Media Organizations Achieved Success with Data Integration

Learn how media organizations have achieved success with Data Integration

Read

6 Ways to Start Utilizing Machine Learning with Amazon We Services and Talend

Look at6 ways to start utilizing Machine Learning with Amazon We Services and Talend

Blog

Why Companies Move to the Cloud: 7 Success Stories

Learn how and why companies are moving to the Cloud

Read Now