This is an outage event:
* On Monday morning, I notice that we cannot connect to TAC server.
* Restart TAC server : /etc/init.d/talend-tac-6.3.1 restart
* All jobs didn't run for a day since Sunday.
* Notice CPU load on one job serve is high. There was an issue on this job server. Reboot the this job server, which fixed the issue.
* The cause:
1. Job server ONE had an issue;
2. This triggered the TAC serve hang;
In catalina.log, I have seen a lot of errors.
"11-Nov-2017 12:49:23.760 WARNING [pool-6-thread-1] org.drools.persistence.jta.JtaTransactionManager.commit Unable to commit transaction
bitronix.tm.internal.BitronixRollbackException: transaction timed out and has been rolled back
at org.drools.persistence.jta.JtaTransactionManager.commit(JtaTransactionManager.java:226) "
3. Then no jobs will be triggered even through I have other job servers.
* In my setup, TAC and job servers are on different independent servers, and the job servers are not in cluster mode. (BTW, I'm not sure which product includes cluster mode.)
We have that OS, DB level monitoring, but nothing pick up this issue. My question is in general, how to monitor the healthy of TAC server and job server ?
Have you already checked TalendHelpCenter:Talend Activity Monitoring to see if it is what you are looking for?
Hello, i have the same issue. Consomation of CPU very high with a restart only 7days ago.
Do you have solution ?
"Hello, i have the same issue. Consomation of CPU very high with a restart only 7days ago."
1. I set up a standalone Talend job(Not managed by TAC) . It checks Activity Monitoring Console DB for the healthy of the job.
2. I also setup a way to call TAC job server for a simple job remotely to check if the job server is healthy.
There is not enough information for us to diagnose your issue. Could you please create a case on talend support portal so that we can give you a remote asistance through support cycle with priority?