I want to execute insert or update action on a MS SQL table 'protect_realised_revenue' which contained more than one million rows and is still growing fast.
I create a job as shown and the data is from the raw table in PostgreSQL (main) and look-up table is the MS SQL table itself.
I set up the tMap_1 as below:
Insert those with imsid not in MS SQL table;
Update those within left outer join scope but different last_modified datetime.
* I know of a fact if the MS SQL table had no extra column __insert_Datetime, then I don't need to bother using tMap_1 but link directly two components and choose 'insert or update' action for the output.
When I run the job, I found Talend actually update/insert every rows from the raw table and it takes only 30-40 rows/s.
Ideally, as what we can do in SQL Server query, we just need to concern about new imsid or distinct last_modified and skip most of the lines in the target MS SQL table.
So, could anyone please suggest how can I re-develop my job to reach a more efficient running process?
Thank you very much for helpful solution.
May I ask you three more questions to clarify?
Please suggest and much appreciated.
I tried the setting in t_Map,
found a issue that when the data was loaded up to multiples of 10,000 rows, the process would cease for a couple of seconds before it resume.
Could you please suggest how can I get it continuously run without 'idle' span?