Restart method of ETL jobs

Four Stars

Restart method of ETL jobs

Hi Forum,
I have been asked by my superiors to find below using Talend ETL ( I am non-frequest ETL developer in BI).
1) How to restart the entire ETL job ( Formally project) when it fails ?
2) Let's say I get error in one job can we execute the other jobs ? ( Assuming non dependent jobs execution in project). 
3) Lets say I upload data from a csv file of 100 rows and after inserting 50 rows I get error.. Next time when I run it has to take the remaining 50 rows and should not read all 100 rows and reject first 50 rows... ( ETL job has to identify where it the error occurred and from there it has to run next time when I execute ?
Thank you.
Sadakar
 
Four Stars

Re: Restart method of ETL jobs

Hi Sadakar,
1) How to restart the entire ETL job ( Formally project) when it fails ?
>> If we implement this, there would be an issue, to get into the infinite loop, because if the job repeatedly fails, then there is no stop for the project execution... This is possible by returning the code after successful execution of master job and if the code is 1 i.e. fail, then execute the job which is a duplicate copy of master job... (one work around)
2) Let's say I get error in one job can we execute the other jobs ?
>> If you connect your subjobs using RunIf connector and set it to true, then whether the job fails or succeed, it will execute
3) Lets say I upload data from a csv file of 100 rows and after inserting 50 rows I get error.. Next time when I run it has to take the remaining 50 rows and should not read all 100 rows and reject first 50 rows... ( ETL job has to identify where it the error occurred and from there it has to run next time when I execute ?
>>>Again it depends upon what is the batch commit size in advance setting of the database. If the advance setting is set at 1000 rows, then they will not be committed at all in case of failure
- There is an option in database to "insert if not exists" based on particular key value, if this option is selected, then duplicate records would not be inserted into the database
- There is also an option to get the maxID or insert timestamp at the target database, fetch what is the last value and query to the source to fetch data which is not in target, this way even if some data is inserted, it will not again try to insert same data...

Hope you got the idea.

Thanks
Vaibhav