Environment is TIS-6.2.1 running on x86_64-redhat-linux-gnu
. When a recurring Task is executed multiple times using a recurring Trigger, there will be seemingly random times at which the Task will not end cleanly, but will have the limbo status "Waiting for the task to end". This status never changes and prevents the recurring Trigger from activating the Task a subsequent time. Ironically (and incorrectly), the JobServer log depicts the Task as successfully completed.
Question: Do any of the Apache Tomcat or other JobServer log files capture any indication of what the Task is doing when "stuck" in this state?
1. This behavior has been observed across multiple Jobs/Tasks, not just one.
2. This behavior has been observed regardless of recurring Trigger type (Simple and CronUI).
3. The only workaround to un-freeze the Task is to bounce the Apache Tomcat Service.
Any information that you can provide will be helpful. Thank you!
Is there any error message in TAC log? Which is located in <ServersInstallationPath>\apache-tomcat-XXX\logs.?
Could you please create a case on talend support portal so that we can give you a remote assistance(webex) through support cycle with priority?
Thank you for your reply! When It happened, I did not see unusual error in <ServersInstallationPath>\apache-tomcat-XXX\logs. I will create a support ticket.
Feel free to let us know if there is any futher help we can give.
We have seen this behavior on our Windows based system as well and agree the only solution we've found is to 'bounce the service'.
We have also seen some jobs remain as running in the history log but TAC thinks they are finished and continues to run the schedule as normal.
In both instances we've not logged a call, partly as it doesn't seem to happen to a pattern, and we suspect it is actually a 'blip' in communication between the TAC and database servers in both instances.
Thank you Bill for the reply! It is good to know we are not alone.
In our case, our TAC and job server reside on different boxes. When it occurred, job was done on the job server and Talend database was updated correctly, but TAC server was not updated with the complete status. I guess at the point that Job Server sent job-done confirmation was somehow missed by TAC server, could be the glitch of network?
We are waiting for another glitch and doing more troubleshooting. I will update if we find the cause.
Have you already created a case on talend support portal? Is there any solution or update from support team?
Our setup is much the same as you and we had the same idea about network glitches. Not sure anyone will ever find what does it. I'm about to log a support ticket as it has happened on one job over the weekend.
Also when I restarted the service I noticed that the Time Line started to report that a number of other jobs had had issues in the intervening time as well as 'cleaning up' the line for the 'troubled' job. Ticket (00082203) is now logged with support.
The above rather looks like Apache is getting its nickers in a twist as the service took for ever to stop - even windows got board of waiting and gave up with the restart!
I have not created support ticket yet. We decided to wait until next glitch then grab as much as information and create ticket. We have been using Talend for several years with different version. This June 29 was the first time we got this issue. It also happened on August 15 and 29. I will update here once I have any progress.
Thank you for the update! If our glitch occurs again, I will pay attention to the service restart log and see if we have the same as yours.
Thanks for your update and feel free to let us know your progress.