Seven Stars

Error handling for job

Hi All,

 

Can anyone please explain me the concept of error handling.

How can i track the child jobs errors in parent job so that i can send an error mail or success mail from parent job

Also if one of the child job fails then the complete parent job should fail and error message should be sent

 

What i have noticed that the other parallel jobs still continue to process even if one of the jobs fails

How can i stop the complete parent job to fail if one of the subjob fails ?

 

Also need clarification on below point

1)  what is the difference between the OnSubjobError and OnComponentError from the parent job or how to use these components.

 

Attached is the sample example.

 

Thanks

1 ACCEPTED SOLUTION

Accepted Solutions
Twelve Stars

Re: Error handling for job

There is a way, but you will need to really pay attention to controlling this. For child jobs 1-1, 1-2, 2-1 and 2-2 use the mechanism I described where you connect to a tDie and terminate the JVM (tDie option). This will kill the other job within that parent job. But for Parent Jobs 1 and 2 set the "Use independent process to run subjob". That should allow your parent jobs (1 and 2) to report on what has gone wrong in your top level parent job. 

Rilhia Solutions
16 REPLIES
Twelve Stars

Re: Error handling for job

There are several ways to achieve this depending on how difficult you want to make it.

 

A way that I like to work is to create a couple of "reporting" jobs. One to be used at the beginning of each job (using a tPreJob component) and one to be used at the end of each job (using a tPostJob component). I have a back end database and every time a job starts or ends I use these jobs to log details about it. There is loads you can log but "job name", "start date", "end date", "status", "pid", "total rows", "success rows", "failed rows", etc, might be a nice place to start. The advantage of using the tPostJob to trigger your "end job" Job is that it will always run...even with a Java error. So you will always be able to log a status for the run. If you use the pid (process id) you can also link these results with the AMC functionality. I have written a piece on the AMC here (https://www.rilhia.com/tutorials/talend-activity-monitoring-console-amc). You may actually be able to get away with just using the AMC, but I use the above method to "hook in" to other bits and pieces.

 

With regard to killing the job immediately, you can either make use of the "Die on child error" functionality supplied within the tRunJob component, or you can make use of the "CHILD_RETURN_CODE" globalMap variable for the job. This can be useful in deriving your own logging logic. But essentially, the job will return a number which can be accessed using the following code....

 

((Integer)globalMap.get("tRunJob_1_CHILD_RETURN_CODE"))

....the number of the tRunJob changes depending on which one you are referencing. If this number is 0, everything is fine. You can set the output using the tDie component. The tDie will also end your job immediately.

 

The difference between OnSubJobError and OnComponentError is that the OnSubjobError is triggered by a SubJob error and the OnCOmponentError is triggered immediately by a component error. An easy scenario to describe of how these can be used differently is with an OnSubJobError you will do something on any error within the SubJob. With an OnComponentError, you can do different things per component within the SubJob that errors. To be honest though, I rarely use them. They do not work as you would expect in many cases and are certainly not uniform in how they report on errors. For example, OnComponentErrors work in very strange ways with some database components.

 

EDIT: I have just looked at your screenshot and feel I may have gone a little over board with my brain dump here. You may be able to solve your problem quickly using RunIf links checking the status of the child job (using the code above) and connecting to a tDie.

 

 

 

 

Rilhia Solutions
Seven Stars

Re: Error handling for job

:-)
Thanks Rhall for the information, however my point is how to fail or stop or interrupt the parent job if any of the child job fails.
Have attached the sample example with the suggesstion given by you to use RuniF (is this what you were saying to implement, however this is not working , please let me know the mistake i ma making).

 

else please let me know the design changes to be done in sample example to get the required output.


Thanks

Twelve Stars

Re: Error handling for job

Yes, that was why I added the EDIT at the end of my post. If you use RunIf links after your child jobs and set the code in the RunIf to respond to the status of the child job (shown in the code section in my first description, but shown below for the RunIf code) .....

 

((Integer)globalMap.get("tRunJob_1_CHILD_RETURN_CODE"))!=0

....you can connect to a tDie and set your error message and error code. You will need to switch off the "Die on child error" tick box for this method to work.

 

I would like to point out that the child jobs you have which are unconnected do not make them run in parallel. It is actually a bad idea to do this. All that will happen is that Talend will run them in the order (I believe) in which they were dropped onto the workspace. It will not be true parallel running.

Rilhia Solutions
Twelve Stars

Re: Error handling for job

What your example proves is that unconnected subjobs do not run in parallel. One will run after the other. Your second job is running before your 1st job. You need to use the Enterprise Edition to get parallel running.

Rilhia Solutions
Seven Stars

Re: Error handling for job

Yeah Thanks, the RunIf works if the "Die on child error" is unchecked.

Parallel running,
The child jobs are do executed in Parallel as per my output files (since multiple files are generated at same time due to parallel execution)

But in anycase, since these child jobs belong to same parent job , is there any way to interrupt or disconnect the parent job (stop parent job execution) in case any of the child job fails
Seven Stars

Re: Error handling for job

Here, modified the sample example to display the child jobs starting message to show that child jobs are running parallely

Twelve Stars

Re: Error handling for job

@vidya821 they are not running in parallel by just dropping them into a job unconnected. You can test this. Create a simple child job with one component, a tJava component. In the tJava component add this code....

 

System.out.println("Start job");
Thread.sleep(10000);

 Now put two versions of this job into a parent job. You will see "Start Job" is printed immediately, then 10 seconds later "Start job" is printed again and then 10 seconds later the parent job will finish. They are definitely run sequentially. 

Rilhia Solutions
Seven Stars

Re: Error handling for job

@rhall_2_0
Understood, parallel execution you are talking about is using the same job twice but with different context say,
the one i am referring is two set of different jobs executed parallely...
am i correct ?
thanks
Twelve Stars

Re: Error handling for job

No. You can create two different jobs with the code I showed you (or completely different code....with a sleep so you can see it) and they will still run one after the other.

Rilhia Solutions
Seven Stars

Re: Error handling for job

could you please check the attachment that i have shared,
Modified_sample_test_example

Not sure how does the parallel execution works then !!

 

Note: I have checked the Multi thread execution option

Twelve Stars

Re: Error handling for job

Oh I see. Sorry, my mistake. I should have asked you a bit more about your config. My assumption was that you were simply dropping the jobs in and expecting parallel processing. 

Rilhia Solutions
Twelve Stars

Re: Error handling for job

Anyway, going back to this question.....


But in anycase, since these child jobs belong to same parent job , is there any way to interrupt or disconnect the parent job (stop parent job execution) in case any of the child job fails


....you can tick "Exit virtual machine immediately" in the tDie's Advanced settings. But you will need to ensure that you have switched off the "Die on child error" of the job which feeds the tDie that does this.

Rilhia Solutions
Seven Stars

Re: Error handling for job

Thanks Rhall,

Sorry, what i was thinking is like the other child job execution (Sample test 2) should stop due to error in Sample test 1, but the control should go back to the parent job and there i can send a mail with an error message.

Is this somehow possible ? (job should not be closed immediately but the control should go back to parent job with error from child jobs and i can use that error to trigger an error mail)

This could solve a big problem of mine!!

Can you also let me know if JVM option if checked will stop the whole job .
for example in the parent job if i call two jobs and i got an error in parent 1 and i used exit JVM option over there then it will stop parent 2 execution also right ?

 

Summarizing, below is the requirement

Parent jobs are running in multi thread execution

1) Parent Job 1

            1) Child Job 1-1

            2) Child Job 2-1

2) Parent Job 2

            1) Child Job 1-2

             2) Child Job 2-2

Siutation - If Child Job 1-1 fails, then the Child Job 2-1 should stop processing further and contral should return to Parent Job 1 and email should be triggered since there was an error in Child Job 1-1

For the Parent Job 2, the process should get completed with success email message.

 

 

Thanks

Twelve Stars

Re: Error handling for job

There is a way, but you will need to really pay attention to controlling this. For child jobs 1-1, 1-2, 2-1 and 2-2 use the mechanism I described where you connect to a tDie and terminate the JVM (tDie option). This will kill the other job within that parent job. But for Parent Jobs 1 and 2 set the "Use independent process to run subjob". That should allow your parent jobs (1 and 2) to report on what has gone wrong in your top level parent job. 

Rilhia Solutions
Seven Stars

Re: Error handling for job

Awesome, Thanks Rhall
This worked !!
Twelve Stars

Re: Error handling for job

No problem. Sorry about the misunderstanding previously. I need to pay attention to the rule....

"Never assume because it makes an "ass" out of "u" and "me" :-)

Rilhia Solutions