Error Recovery management

Overview

This article introduces the Error Recovery feature and explains how to use this feature in Talend Data Integration.

 

Environment

This procedure was written with:

  • Talend Enterprise Data Integration Professional edition for 5.0.2-r78327 (It was renamed Talend Data Integration in version 6.0)
  • JDK version: Sun JDK build 1.6.0_26-b03
  • Operating system: Windows XP SP3

Talend verified this procedure to be compatible with all versions of Talend Data Integration.

 

Feature Introduction

Job execution processes can take a long time to finish. Backup and restore operations also take a long time. Talend Data Integration Studio includes a recovery checkpoint capability that you can set up at Job design time to allow processes to be resumed from one of the checkpoints if an error occurs. Job developers can design and integrate error management for specific error conditions using the checkpoint “on-failure” instruction function.

 

Recovery checkpoints can be initiated at specified intervals in the data flow (on trigger connections), to minimize the amount of time and effort required when a Job execution process needs to be restarted due to a failure. The process can be restarted from the last checkpoint prior to the failure, or any other checkpoint before the failure occurred, rather than from the beginning of the Job execution process.

 

Procedure

This procedure uses an example Job to explain how to:

  • Set recovery checkpoints
  • Deploy the Job
  • Execute the Job from Talend Administration Center
  • Restart the Job from the last checkpoint before the failure

This Job, with relevant files, is available in the attached zip file.

 

Create an example Job

Open a remote project and create an example Job called ErrorRecoveryDemo.

Note: The Error Recovery feature can be edited only in a remote project. If you open a local project, the error recovery settings are grayed out. You must open a remote project to use this feature.

 

This Job consists of four subjobs. See the Job design in the following image:

1_017.png

 

  1. The first subjob starts with tFixedFlowInput_1 and writes the following line of text to the file D:/file/out.txt:

    ******The job begins to run******
  2. The next subjob starts with tFileInputDelimited_1, reads data from the text file D:/demo/file1.txt, and appends the data to the file:D:/file/out.txt.
  3. The next subjob starts with tFileInputDelimited_2, reads data from the text file D:/demo/file2.txt, and appends the data to the file D:/file/out.txt.
  4. The last subjob starts with tFixedFlowInput_2 and appends the following line of text to the file D:/file/out.txt.

    ******The job finishes execution******

 

Set up recovery checkpoints

When working in a remote project, you can define checkpoints on the OnSubjobOK and OnSubjobError trigger connections. To define a checkpoint on a subjob trigger connection, perform the following steps:

  1. Click the OnSubjobOK or OnSubjobError trigger connection you want to set as a checkpoint. The basic settings view of the selected trigger connection appears.

    2_016.png

  2. From the Error recovery tab, select the Recovery Checkpoint check box. The icon 3_008.png is appended on the selected trigger connection.
  3. Enter a name for the checkpoint in the Label field, such as checkpoint1. In the Failure Instructions field, enter text to explain the problem and what might cause this type of failure.

    4_008.png

 

Execute the Job in Talend Studio

Execute the Job in Talend Studio to ensure the Job compiles and runs successfully. If you are using the demo Job provided in this article, you must change the default value of the context variable to use your local file path for the files required for this Job after importing the demo into your Studio. Follow these steps:

  1. Open the Contexts view, then click the Values as table tab.

  2. Change the default value of the two context variables to your local file path.

    5_007.png

  3. Go to the Run view and press the Run button or press F6 to execute the Job. The following data is written to the file D:/file/out.txt:

    ******The job begins to run******
    1;Shong
    2;Elise
    3;Mike
    4;Pedro
    ******The job finishes execution******

 

Deploy the Job on Job Conductor

  1. Deploy the Job on the Job Conductor of Talend Administration Center. Refer to Working with Job execution tasks for more information on how to deploy a Job on the Job Conductor. Here we add a task called ErrorRecovery on the Job Conductor.

    6_004.png

  2. Select the ErrorRecovery task and click the Run button on the menu bar to check that the Job deployed and executed successfully.

    7_004.png

  3. Open the file D:/file/out.txt and check that you see the same results as when you executed the Job in Talend Studio, as follows:

    ******The job begins to run******
    1;Shong
    2;Elise
    3;Mike
    4;Pedro
    ******The job finishes execution******

 

Simulate Job execution failure

To test the Error Recovery feature and restart the Job from the last checkpoint prior to the failure, this example simulates Job execution failure as follows:

  1. Back up the source file file2.txt required for the third subjob in this example, and then delete it from the source directory D:/demo/.

  2. Click the Run button in the tool bar to execute the selected task:

    9_004.png

  3. The Job execution fails as expected, because the required file file2.txt does not exist in the specified directory. Open the Logs tab to see the last Job execution log message:

    10_004.png

  4. Open the output file D:/file/out.txt and verify that only the following data is written to the file:

    ******The job begins to run******
    1;Shong
    2;Elise

    This data is generated by the first and second subjobs. The Job failed at the third subjob.

 

Restart the Job from the last available checkpoint

Restart the Job from the last checkpoint prior to the failure, rather than executing the Job again from the beginning of Job execution process, as follows:

  1. Open the Error Recovery Management module at the last execution of this task by clicking the Recovery last execution button in the tool bar.

    11_004.png

    The Error Recovery Management page opens as shown below.

    12_004.png

  2. Before recovering the Job, find the cause of the Job execution failure by parsing the log message, and then fix the problem. In this example, the problem is that the source file required for this Job does not exist. Restore the source file file2.txt.

  3. Select the task you want to recover and open the Recovery Checkpoints tab.

    13_002.png

    In this example, the third subjob (which starts with tFileinputDelimited_2) fails.

  4. Restart the job execution from last checkpoint checkpoint2 as follows:

    1. Select the checkpoint checkpoint2 and click the Launch recovery button to restart the Job execution at that checkpoint.

      14_002.png

    2. Open the output file D:/file/out.txt. Verify that the data generated by the third and fourth subjobs is appended.

      ******The job begins to run******
      1;Shong
      2;Elise
      3;Mike
      4;Pedro
      ******The job finishes execution******
Version history
Revision #:
19 of 19
Last update:
‎05-18-2017 06:37 PM
Updated by:
 
Tags (1)
Comments
bipinkumarcse

in my job i am not able to see error recovery tab in all of the OnSubjobOk connection ? why ?