Five Stars

How can I speed up insert or update action on a big table?

I want to execute insert or update action on a MS SQL table 'protect_realised_revenue' which contained more than one million rows and is still growing fast.

I create a job as shown and the data is from the raw table in PostgreSQL (main) and look-up table is the MS SQL table itself.  

 

1.png

 

I set up the tMap_1 as below:

 

Insert those with imsid not in MS SQL table;

Update those within left outer join scope but different last_modified datetime.

 

2.png

 

* I know of a fact if the MS SQL table had no extra column __insert_Datetime, then I don't need to bother using tMap_1 but link directly two components and choose 'insert or update' action for the output.

 

When I run the job, I found Talend actually update/insert every rows from the raw table and it takes only 30-40 rows/s.

 

Ideally, as what we can do in SQL Server query, we just need to concern about new imsid or distinct last_modified and skip most of the lines in the target MS SQL table.

 

So, could anyone please suggest how can I re-develop my job to reach a more efficient running process?

 

Cheers 

 

  • Data Integration
3 REPLIES
Four Stars

Re: How can I speed up insert or update action on a big table?

Hi ,

You can make the lookup join as inner join and create 2 outputs in the tmap. One for insert and one for update.

Insert flow should have the catch output reject set to true and the update flow should have catch output reject to false.

Also do not bring in all the columns from the lookup. Just keep the key columns and the columns which need to be propagated to the output which will increase the lookup performance significantly.

Thanks
Raghu
Five Stars

Re: How can I speed up insert or update action on a big table?

Thank you very much for helpful solution.

 

May I ask you three more questions to clarify?

 

  • If I use inner join in tMap, do I have to change the setting of 'Catch output reject' or 'Catch lookup inner join reject'?
  • The inner join condition was: 1) equal id 2) equal last modified time. If id does not exist, we insert; if last modified time is different, we update. So, as mentioned in reply, if I choose reject 'true' for insert, but keep update false in reject, will it happen that all records our of inner join was inserted, including those with only different last modified time, and meanwhile the update action didn't do anything?

         1.png

 

  • Can I set both update and insert output with Catch lookup inner join reject as true and expect two types of actions were taken correctly under this setting?

 

Please suggest and much appreciated.

 

Tags (1)
Five Stars

Re: How can I speed up insert or update action on a big table?

I tried the setting in t_Map,

 

2.png

 

 

found a issue that when the data was loaded up to multiples of 10,000 rows, the process would cease for a couple of seconds before it resume.

 

Could you please suggest how can I get it continuously run without 'idle' span?

 

1.png

 

 

Cheers

 

 

Tags (1)