One Star

Recovery on failed job

Hi,
Could anybody tell me is there any option available in Talend Open Studio Data integration to read the record from where it has stopped?
Suppose I have 10 million records in source. I started the job execution and after some time (may be after 1 hr) due to some issue (network etc.), my job failed. When the job failed, already 1 million record transferred to target. What will happen to the record if i re-run the job again. Is the job going to read data from the begining and target data will be truncated or will it read from the position where it stopped and data will be appended in target?

Thanks in advance.
Best regards,
Sisir
7 REPLIES
One Star

Re: Recovery on failed job

The job will read data from the beginning.
Target data may be truncated or may be appended depending on how you have the target component configured. Please let us know which component you are using for a target, and we can better explain how to configure the component for your needs.
One Star

Re: Recovery on failed job

Hi Wayne,
Thanks for your reply.
Could you please tell me is there any option available in Talend that reads data from the particular point where it stopped.
I set commit interval for target component 10000. I don't want to load previous records in my target. If the job gets failed due to any issue(say network or anything else), then for next run, it has to read the data from where it left. It should not bring again the previous records which are already been committed in target. Let's say I am moving data from one Oracle table to another Oracle table.

Best regards,
Sisir
Community Manager

Re: Recovery on failed job

There are checkpoint that can help recover from job execution failure: https://help.talend.com/search/all?query=How+to+recover+Job+execution+in+case+of+failure&content-lan...
but this is only available in the subscription versions of Talend.
One Star

Re: Recovery on failed job

Checkpoints can certainly help with job recovery. However, there is a potential hidden issue when pulling data from relational databases. Relational databases in general, and SQL by definition, are non-deterministic regarding the sequence in which rows are returned.
For example, if data has been added to the source between runs, then the new data may occur in the middle of the data that has already been written in the previous run. The job has to consider the possibility of new data occurring in the middle of data that has already been written. An order by clause can reduce the issue, but may or may not be sufficient. An order by clause can also affect performance of the job. There are ways to reduce this problem, by including things like an inserted data, or last updated date in the where clause and in the order by clause.
Consider checkpoints. Also consider using a flush-and-fill strategy for your job. If flush-and-fill is not possible, consider using something like a last-updated date in your where clause and order by clause to reduce the data issues.
One Star

Re: Recovery on failed job

Thanks Wayne.
I will try to implement the tips you provided. Post the result once I get significant outcome.

Best regards,
Sisir
One Star

Re: Recovery on failed job

Hi,
how to set a commit interval on target?. where can i find the option to enter the number . for example i want to set commit interval of 10000. where can i do it. is that option available in open studio?
Community Manager

Re: Recovery on failed job

Hi,
how to set a commit interval on target?. where can i find the option to enter the number . for example i want to set commit interval of 10000. where can i do it. is that option available in open studio?

The commit internal option is usually in the advanced settings tab of database output component, such as tMysqlOutput.
Best regards
Shong
----------------------------------------------------------
Talend | Data Agility for Modern Business