I want to execute insert or update action on a MS SQL table 'protect_realised_revenue' which contained more than one million rows and is still growing fast.
I create a job as shown and the data is from the raw table in PostgreSQL (main) and look-up table is the MS SQL table itself.
I set up the tMap_1 as below:
Insert those with imsid not in MS SQL table;
Update those within left outer join scope but different last_modified datetime.
* I know of a fact if the MS SQL table had no extra column __insert_Datetime, then I don't need to bother using tMap_1 but link directly two components and choose 'insert or update' action for the output.
When I run the job, I found Talend actually update/insert every rows from the raw table and it takes only 30-40 rows/s.
Ideally, as what we can do in SQL Server query, we just need to concern about new imsid or distinct last_modified and skip most of the lines in the target MS SQL table.
So, could anyone please suggest how can I re-develop my job to reach a more efficient running process?
Thank you very much for helpful solution.
May I ask you three more questions to clarify?
Please suggest and much appreciated.
I tried the setting in t_Map,
found a issue that when the data was loaded up to multiples of 10,000 rows, the process would cease for a couple of seconds before it resume.
Could you please suggest how can I get it continuously run without 'idle' span?
Introduction to Talend Open Studio for Data Integration.
Practical steps to developing your data integration strategy.
Create systems and workflow to manage clean data ingestion and data transformation.