Talend Job running Slower with multiple Update Tables in the Same Job

Highlighted
Five Stars

Talend Job running Slower with multiple Update Tables in the Same Job

Hello Team,

 

Am working in data anonymisation project and using Talend as tool to anonymise the data.

The design is to do anonymisation as in situ.

As anonymisation is required only for the PPI fields we are taking one table as source which has annoymised data and trying update the same fields for mutiple tables (6 to be exact) in the same job. 

As data volume is enormous in the output tables (ranging from 11 Million - 48 Million ) in the output table the load was not possible in one go. Hence we used the thread logic which would divide the job into 8 thread to fasten the process. 

Hence, there is a restartabilty factor involved.

As an observation 10,000 X 8 threads from input is taking 2 hours approaximately. Which is very slow.

 

Hence, we tried to divide into 6 different jobs where in one job updates each table. The tables having 24 Mil and 48 Mil tables were still slow. Therefore, we used the insert method for the two tables.

Somehow the above improvisation reduced the time but we have a concern. If we follow the insert design it would need to have all fields of the tables in the schema. Hence addition of any new fields in future would need a code change. This concern has been raised, hence, the design is rejected.

 

We also though for the tOracleOutputBulkExec however it has restart-ability issues.

 

Would you kindly kind have a look at the attached design and recommend us input from your end to make the design efficient.

 

Please let me know for any information have missed to provide.

 

Regards,

Venu

Student Loan Company, 100, Bothwell Street, Glasgow 


Accepted Solutions
Five Stars

Re: Talend Job running Slower with multiple Update Tables in the Same Job

Thanks for the suggestion, we will try to implement the below recommendation and will see the results.

 

 

Regards,

Venu


All Replies
Nine Stars

Re: Talend Job running Slower with multiple Update Tables in the Same Job

What is your Oracle cursor size set to?

How often are you committing?

Five Stars

Re: Talend Job running Slower with multiple Update Tables in the Same Job

Hello Team,

We have used the default Cursor size in the TDBInput componet which is 1000
The commit is happening after a specific number of rowstoprocess we are running on the child job one. ( say 01 Million )
Now 1 million records are sourced from input table.
Once it goes for update to output table more or less records would be matched. Hence for some tables updates would be done in thousands and some it would be done in multi millions
Eg of one Run:
Source Input Table - 1 Million ( join of 2 tables )
Target Output 1 - 01 Million ( one of the source table )
Target Output 2 - 700 k
Target Output 3 - 150 k
Target Output 4 - 900 k
Target Output 5 - 1.48 Mil
Target Output 6 - 745 k

Commits happens after this for one thread, like wise it will do for all 08 threads.

Please let me know for any other questions.


Regards,
Venu

Nine Stars

Re: Talend Job running Slower with multiple Update Tables in the Same Job

Hi,

 

Also something to consider, are the threads processing data in one table/partition or is a given thread trying to update all partitions? If partitions even exist. If you table to be updated in partitioned in some way, try and get each thread to only update one partition at a time.

 

Regards David
Dont forget to give Kudos when an answer is helpful or mark the answer as the solution.
Nine Stars

Re: Talend Job running Slower with multiple Update Tables in the Same Job

You could probably increase the TDBInput cursor size to 10K or 100K, maybe even 1M.

 

Usually I use 1M sometimes I have even used 2M, just depends on how many columns are in the DB table and your Oracle instance's resources.

Five Stars

Re: Talend Job running Slower with multiple Update Tables in the Same Job

Thanks for the suggestion, we will try to implement the below recommendation and will see the results.

 

 

Regards,

Venu

2019 GARNER MAGIC QUADRANT FOR DATA INTEGRATION TOOL

Talend named a Leader.

Get your copy

OPEN STUDIO FOR DATA INTEGRATION

Kickstart your first data integration and ETL projects.

Download now

What’s New for Talend Summer ’19

Watch the recorded webinar!

Watch Now

Put Massive Amounts of Data to Work

Learn how to make your data more available, reduce costs and cut your build time

Watch Now

How OTTO Utilizes Big Data to Deliver Personalized Experiences

Read about OTTO's experiences with Big Data and Personalized Experiences

Blog

Talend Integration with Databricks

Take a look at this video about Talend Integration with Databricks

Watch Now