Six Stars

How to monitor the healthy of TAC server and job server?

Hi 

This is an outage event:

* On Monday morning, I notice that we cannot connect to TAC server.
* Restart TAC server :  /etc/init.d/talend-tac-6.3.1 restart
* All jobs didn't run for a day since Sunday.
* Notice CPU load on one job serve is high. There was an issue on this job server. Reboot the this job server, which fixed the issue.
* The cause: 

     1. Job server ONE had an issue;

     2. This triggered the TAC serve hang;

In catalina.log, I have seen a lot of errors. 

"11-Nov-2017 12:49:23.760 WARNING [pool-6-thread-1] org.drools.persistence.jta.JtaTransactionManager.commit Unable to commit transaction
bitronix.tm.internal.BitronixRollbackException: transaction timed out and has been rolled back
at bitronix.tm.BitronixTransaction.commit(BitronixTransaction.java:250)
at bitronix.tm.BitronixTransactionManager.commit(BitronixTransactionManager.java:143)
at org.drools.persistence.jta.JtaTransactionManager.commit(JtaTransactionManager.java:226) "        

    3. Then no jobs will be triggered even through I have other job servers.

* In my setup, TAC and job servers are on different independent servers, and the job servers are not  in cluster mode. (BTW, I'm not sure which product includes cluster mode.)

    

We have that OS, DB level monitoring, but nothing pick up this issue. My question is in general, how to monitor the healthy of TAC server and job server ?

 

Thanks,

 

1 REPLY
Moderator

Re: How to monitor the healthy of TAC server and job server?

Hello,

Have you already checked TalendHelpCenter:Talend Activity Monitoring to see if it is what you are looking for?

Best regards

Sabrina

--
Don't forget to give kudos when a reply is helpful and click Accept the solution when you think you're good with it.