The Job in Execution Plan is stuck in Running state after TAC loses the DB connection

Scenario

  • Create Task1 and Task2 and create an Execution Plan: Task1 > onOk > Task2.

  • Run the Execution Plan: observe that the status of Task1 is Running.

  • Before Task1 ends, it stops and restarts the TAC DB.

  • TAC briefly loses the DB connection, but recovers the DB connection once the DB is back.

  • Check the execution logs from JobServer side, notice that Task1 already ended.

  • From TAC however, the Task1 status is still Running, and is stuck in this status.

  • Check the Execution Plan, and notice the status is still Running, and is stuck in this status.

You need TAC to seamlessly sync the Task's status with the one from the JobServer, and any Execution Plan needs to continue as expected.

 

Solution

Against Talend 6.4.1, the solution consists of applying Patch_20171124_TPS-2253_v1-6.4.1.zip.

  1. Contact Talend Support to request patch Patch_20171124_TPS-2253_v1-6.4.1.zip.

  2. Use the patch Readme file steps (embedded in the patch zip file) to apply the patch.

Note: You may need to clear your browser cache for the patch to take effect.

Version history
Revision #:
5 of 5
Last update:
‎11-12-2018 09:09 AM
Updated by:
 
Contributors
Comments
Six Stars

I'm experiencing the same issue. We've had network problems - not Talend related - where we lose connectivity to the TAC database for a brief period. The result is exactly what you described - Plans hanging in a "Running" status, when in fact they have completed; but the next trigger never fires because Talend believes it's still running. We've also seen this on individual jobs (Tasks) that happen to be executing when the connection is lost - and they get stuck as well. 

 

The "cause" of the issue is not Talend related - However, the fact it can't recover is a problem. I've been able to do a work-around I saw posted here suggesting you go into the TAC database and manually reset the status - which works - but I HATE modifying a vendor database. 

 

I'll request this patch and hopefully it resolves the issue - because it's a pain to deal with when we have jobs running every 60 seconds in some cases and they all get hung-up and have to be re-set.

 

Really glad I found this - Thanks!