Insert/Update versus table-file-table

Five Stars

Insert/Update versus table-file-table

Hi ,

 

I have been using insert/update to update or insert a table in mysql from sql server. The job is set up as a cronjob. The job runs every 8 hours. The number of records in the source table is around 400000. Every 8 hours around 100 records might get updated or inserted.

 

I run the job in such a away that at the source level, I only take the modified runs between the last run and the current run.

 

I have observed that just to update / insert 100 rows the time taken is 30 minutes.

 

However, another way was to dump all of the 400000 in a file and then truncate the destination table and insert all of those records all over again. This process is done at every job run

 

So, now may I know why does insert/update take so much time?

 

The job design for insert/update is  as follows

Capture.PNG

Thanks in advance

Rathi

Fifteen Stars TRF
Fifteen Stars

Re: Insert/Update versus table-file-table

Hi,

30 minutes to insert/update only 100 records seems very long.

Do you know elapsed time for each task? 

How long for the tMSSqlInput, tMSSqlRow and tMysqlOutput?

What appens in tMSSqlRow?

 


TRF
Five Stars

Re: Insert/Update versus table-file-table

Ahh, I do not know the elapsed time for each component. 

 

The tmmsqlrow updates the time in a table about the end of time of the job

 

Thanks 

Rathi

Five Stars

Re: Insert/Update versus table-file-table

Hi TRF, 

 

Are there any other kind of statistics that I need to gather to help explain the problem?

 

Thanks

Rathi

Fifteen Stars TRF
Fifteen Stars

Re: Insert/Update versus table-file-table

As you have a tStatCatcher subjob, just check the option "tStatCatcher statistics" on Advanced settings for all critical components.

If possible, compare the elapsed time when doing operations outside of Talend.


TRF
Community Manager

Re: Insert/Update versus table-file-table

Possible reasons for this (given what you have said) are....

1) You are not updating on a primary key or an indexed key. Do you have any indexes on your insert table? You may only update/insert 100 records but you say you are actually processing 400000 records. That is potentially n(number of records in your source table) x 400000 comparisons (possibly full table scans).
2) Do you have insert/update triggers on your insert table? These will add some latency.....but I wouldn't expect this to be your cause if I am honest.

 

Calling Talend Open Studio Users

The first 100 community members completing the Open Studio survey win a $10 gift voucher.

Start the survey

2019 GARNER MAGIC QUADRANT FOR DATA INTEGRATION TOOL

Talend named a Leader.

Get your copy

OPEN STUDIO FOR DATA INTEGRATION

Kickstart your first data integration and ETL projects.

Download now

What’s New for Talend Summer ’19

Watch the recorded webinar!

Watch Now

Best Practices for Using Context Variables with Talend – Part 4

Pick up some tips and tricks with Context Variables

Blog

How Media Organizations Achieved Success with Data Integration

Learn how media organizations have achieved success with Data Integration

Read

Definitive Guide to Data Quality

Create systems and workflow to manage clean data ingestion and data transformation.

Download