Job does not die properly on child job error

One Star

Job does not die properly on child job error

I have a weird situation where a job failed to die properly when a child of a child job died.
Please see the attachments to see how the job is setup. Essentially this is the main job which is in charge of calling jobs which run a multitude of child jobs. I am having an issue with the whole job not dieing when a child job within a tJob errors out. I have die on child error checked for all of the necessary tJobs. Ive attached the results from a query on the logcatcher db, and its reporting the jobs and their parent jobs errored and died, however the process continued to run (even though the links between the jobs are set to on SubJob OK).
The job is being run by TIS 2.3.3 through the Job Conductur admin page. When the subjobs error they are reported in the dashboard to be errored, and even the main job shows up as having an error, however the job continues to run even though it should have died because of the child error.
This seems to only happen on calling jobs withing jobs, because in the image below when it erred at the job which calls the CompressFTP job (which is simply calling a job, not job within jobs like the other tJobs in that job). The process died correctly and did not continue.
Why would the job continue running when there were errors on child jobs?
One Star

Re: Job does not die properly on child job error

A little more information on this one.
The job will die properly if all of the jobs within the job that is being called error. However if just a few of the jobs error the process continues on.
Highlighted
One Star

Re: Job does not die properly on child job error

I also modified the job that just calling multiple child jobs to link all the tJobs to tDies with onComponentError links (I used these instead of on subjob because you can only link 1 tJob to a tDie if you use OnSubJob). Attached is the modification I made to that job.
Also I have attached an excerpt from the dashboard, which is report that the s_WorkFlowManager_jobs is report an error, as well as the job within that job TXQueueMetrics is reporting an error. What I cut off was the jobs that continued to run after that, and the fact that no error was reported in the WorkFlowManager_Main job.
So based on the dashboard it says the WorkFlowManager_jobs job errored, which in the screenshots I posted above is linked to the email component with an on component error link which isnt being triggered unless all jobs within the WorkFlowManager_jobs error, and the on subjobok link to the WorkFlowManager_jobs_rpt is being triggered although the dashboard says the WorkFlowManager_jobs errored out. Please let me know if anyone has any recomendations on how to get the error handling to work correctly.
One Star

Re: Job does not die properly on child job error

Alright I think I understand whats going on, not a fan of my solution and would really like another fix. Essentially after running some tests I think I understand what is happening.
The main job seems to only error when the last job in the batch of jobs that run returns an error. It seems like each job (out of the 27) resets the Error result of the tJob component. So without having them linked through a chain the error gets reset with the next job in batch and unless the last job in the batch errors out it will assume the entire batch returned correctly. By linking them together it ensures that if there is an error it will be the last one which reports to the component and will trigger the error condition.
The issue with linking them together in a chain like this is it takes away the ability to have them run multi threaded as well as complicates activating/deactivating jobs in the batch because now the links will need to be remade between components if a job needs to be activated or deactivated.
Is there a better solution to getting the error handling to work in the main job without having to link all the jobs in the batch together in a single flow?