One Star

Incremental load, building a Fact table

Do you have any tips or tutorials, specific components for performing incremental load of tables?
I want to load all the new records from source to target, where target originally does not have those records. And I need to be able to handle errors.  1000 records in source need to be inserted into target, and there might be times when the job fails after inserting 500 records. So if I have to rerun the job, how do I skip those 500 already inserted?   The db is oracle.
Overall goal:
Step1: Insert a Record into a table with Seq, Jobname, Jobstartdt, status, Datafrom and Datatill 
Step2: Execute the Actual Dimension/Transaction data where changeDt >= DataFrom and is < sysdate 
It will have insert and update. 
Step3: Execute the first step flow again to update the JobEndDt and Status as complete 
The above steps need to complete daily for each Job. 
TabA -- (Audit Table) 
SeqID, JobName, JobStartdt,JobEndDt, Status, DataFrom,DataTill 
1, job1, 20/8/2015 01:30 AM, 20/8/2015 01:50 AM, Complete,18/8/2015, 19/8/2015 
2, job1, 21/8/2015 01:30 AM, null, 'Error',19/8/2015,20/8/2015 
3, job1, 22/8/2015 01:30 AM, , ,19/8/2015,21/8/2015 
TabB -- 
EmpID, EmpName, DOJ, Address1, Address2, ChangeDt, LoadDt 

Address change may occur for users multiple times, i would need that record loaded into target.


Re: Incremental load, building a Fact table

Hi ghudson_js1,
Could you please refer to Talend Change Data Capture (CDC) which identifies and captures data that has been added to, updated in, or removed from database tables and makes this change data available for future use by applications or individuals to see if it is what you are looking for?
TalendHelpCenter:CDC architectural overview.
Best regards
Don't forget to give kudos when a reply is helpful and click Accept the solution when you think you're good with it.
Seventeen Stars

Re: Incremental load, building a Fact table

as Sabrina said have a look at CDC about capture of changing data (intrusive way or not).
But  you said
when the job fails after inserting 500 records

what's the reason job failed ?
it's about your data, or network, etc ??
how do you manage transaction ?