Cleanly terminate all child jobs invoked by the same parent when any child job dies

Highlighted
Four Stars MPR
Four Stars

Cleanly terminate all child jobs invoked by the same parent when any child job dies

Apologies if that has been asked before. I'm using a parent job that invokes tRunJob for each job from a list of jobs specified in a database table. This requires "Use dynamic job" to be used, which in turn requires an independent jvm for each child job. I want to invoke these jobs in parallel, which is easily done with a parallel iteration component. The problem I have is that from what I can tell Talend does not automatically have any way to communicate a "kill" message to a child job running in an independent jvm. I would have expected the Talend code to contain handling for thread.interrupt() messages, but it doesn't seem to. This can be a problem if one child fails, and you want the parent to in turn fail, but this leaves another child running as a zombie for potentially a long time. So to sum up: If I invoke four child jobs in independent jvms in parallel from a single parent, and one of the child jobs dies, is there any easy way to force a "die" command to all children? I know I could put a custom looping listener in each child (I would prefer to not do this as the complexity is not worth the lift), and I know I could also have the parent job collect all of the system_pids from the statcatcher in a tpostjob flow and run an OS /taskkill /f [pid] on all the other children, but this is obviously not ideal either. I feel like Talend should have some way to manage child threads. Am I missing anything?    


Accepted Solutions
Community Manager

Re: Cleanly terminate all child jobs invoked by the same parent when any child job dies

I don't believe this to be an omission if I'm honest. This is an unusual requirement for data integration, but can be achieved using methods you have suggested. If I am running parallel processes which can be run independently, it suggests that the separate data paths are not related in terms of time of processing or the data that is processed (....until you get to a synchronisation type child job). Now if one of those processes ends through failure, that failure should be logged. I imagine you are doing that. But if Talend were to then arbitrarily end the other currently running processes, you would have little control over when or where they are ended. They may have already completed, they may not have started. This is generally not a good idea. It would be better to have a synchronisation job which is made aware of the error and to carry out a clean up with that. Now I can understand scenarios where you may want to arbitrarily stop everything, but those scenarios would be far more unusual than letting the jobs finish their natural flow.

 


All Replies
Community Manager

Re: Cleanly terminate all child jobs invoked by the same parent when any child job dies

I don't believe this to be an omission if I'm honest. This is an unusual requirement for data integration, but can be achieved using methods you have suggested. If I am running parallel processes which can be run independently, it suggests that the separate data paths are not related in terms of time of processing or the data that is processed (....until you get to a synchronisation type child job). Now if one of those processes ends through failure, that failure should be logged. I imagine you are doing that. But if Talend were to then arbitrarily end the other currently running processes, you would have little control over when or where they are ended. They may have already completed, they may not have started. This is generally not a good idea. It would be better to have a synchronisation job which is made aware of the error and to carry out a clean up with that. Now I can understand scenarios where you may want to arbitrarily stop everything, but those scenarios would be far more unusual than letting the jobs finish their natural flow.

 

Four Stars MPR
Four Stars

Re: Cleanly terminate all child jobs invoked by the same parent when any child job dies

I guess I sort of agree with you. I do acknowledge that for 99% of scenarios, what you are suggesting is the standard expected behavior, and it is for me in most cases as well. I'm new to Talend and come from an SSIS background, so I thought maybe I was missing something obvious. The issue I am trying to solve is around clarity of job state when using multiple parallel children and ease of restartability upon failure. Given an ETL process that is designed for end-to-end idempotence, and a backfill or other large volume scenario, it seems reasonable for it to be desirable that all child jobs can be set to fail together simultaneously, i.e. have any failure be a hard stop across all processes. This is not easily done with parallel dynamic trunjobs. If one child fails, and a developer wants to immediately restart the process after addressing the issue, how does one clearly communicate to that person which child jobs are in fact still running, and provide the ability to terminate them without running manual OS or DB kill commands for each thread? You definitely do not want a process restarted while unaffected children are still running, which is the scenario I'd like to avoid. 

2019 GARNER MAGIC QUADRANT FOR DATA INTEGRATION TOOL

Talend named a Leader.

Get your copy

OPEN STUDIO FOR DATA INTEGRATION

Kickstart your first data integration and ETL projects.

Download now

What’s New for Talend Summer ’19

Watch the recorded webinar!

Watch Now

Best Practices for Using Context Variables with Talend – Part 1

Learn how to do cool things with Context Variables

Blog

Migrate Data from one Database to another with one Job using the Dynamic Schema

Find out how to migrate from one database to another using the Dynamic schema

Blog

Best Practices for Using Context Variables with Talend – Part 4

Pick up some tips and tricks with Context Variables

Blog