You can do the incremental load as follows :-
a) Extract the last run date from the DB and store in context variable. Also store the current time in another context variable.
b) Execute the SQL query from source database to extract the incremental data. Make sure your query's where clause is having both last run date and current system date stored in context variables.
c) Once the data extraction is complete, load the new system date from context variable to configuration table storing last run dates.
Another approach to extract the delta data is to implement the CDC (Change Data Capture) in source tables. Below link can give details about Talend CDC capabilities.
Note:- If the suggestion provided has helped to resolve your query, could you please mark the topic as solution provided? It will enrich the Talend community.
Unfortunately I do not have the backup of this job which I had created. But it is a very straightforward implementation.
Please feel free to get in touch if you are ever stuck during the job creation. We are always there to help
Apologies for the delay as I was on vacation.
The last run date will capture the time stamp which was used as the cut off time to fetch the delta records. Whenever you are running the delta job, the data fetch should happen between last run date+timestamp and current date+timestamp.
There are no PDF documents for this process but you can easily create a job flow during this concept. If you are stuck somewhere, please feel free to create a new post topic along with job flow and component error screenshots and we will be there to help you :-)
Hi @nikhilthampi ,
I want to know your opinion on Delta load for tracking insert and updates using the last_modified_date.
Is it beneficial to use Insert or Update option in the output component (after filtering the source with records greater than last_updated_timestamp) , or to add a left outer join using tmap , Update the matched record and insert the rejects .
will there be a difference in performance ? , what is the recommended approach ?
Lookup in tMap might be a costly option if your lokup tables have million of rows. In that case, its better to select a key value in your output DB component and use Insert or Update option. It will then use the primary key of the underlying table which will be faster.
Again it is not a hard and fast rule for all the scenarios and DBs. Some DBs/DWs will give maximum throughput for insert only transactions where as some others will give decent performance for both insert and update in same flow.
So always do the performance tests before taking the call to select the right option.
Please appreciate our Talend community members by giving Kudos for sharing their time for your query. If your query is answered, please mark the topic as resolved :-)
Talend named a Leader.
Kickstart your first data integration and ETL projects.
Watch the recorded webinar!
Pick up some tips and tricks with Context Variables
Learn how media organizations have achieved success with Data Integration
Learn how and why companies are moving to the Cloud