One Star

Alerting on routes for faults, missed trigger, missed consumer, etc

I am working on porting some mediation processes from a different ESB into Talend ESB. I need to be able to alert on the following type of events:

Fault - a route exited with a fault.
Missed Trigger - a route was not triggered as expected. e.g. a route should run once every 5 minutes but it has been > 5 minutes since last time it ran.
Missed Consumer - a route started but did not finish within a configured amount of time.

The 'Fault' condition seems pretty straight forward to monitor, but I'm not sure about Missed Trigger and Missed Consumer. A few complications here are 1) We will be running in a distributed environment, where a route may run on any one of 4 servers and 2) we need to be able to detect Missed Trigger and Missed Consumer even if the server that normally should run the route is completely offline and inaccessible. For example, if a route starts execution on a server and during execution the server completely dies, I need to be able to detect that the route did not finish, even if the server that was executing the route will never be online again (meaning, I can't look at logs for this purpose or depend on anything on the server to help me).
We currently accomplish this using a separate (home grown) monitoring system that watches for process started and process ended events from the existing ESB, and it has logic to correlate things together based on message ids and correlation ids, and with that information it can determine if things have not finished or haven't been kicked off in X amount of time, etc.
Our monitoring system listens on JMS because the existing ESB is tightly bound to JMS... each process has entry and exit topics, so it is a really straight forward thing to have a monitoring system watch entry and exit topics to see traffic happening on the ESB.
Any ideas how we might accomplish this in Talend/Camel? In theory, we could put wiretaps in all the routes, and send messages to JMS, but I'm wondering if there is some more elegant way that does not require configuring wiretaps and jms clients in all the mediation routes. For example, possibly some camel context level events we can configure and listen to (preferably still over JMS).

Re: Alerting on routes for faults, missed trigger, missed consumer, etc

Hi Eric
You have a few issues bundled together here.  Let's start with #1 Fault handling.  Fault handling is indeed pretty easy with Camel, but just keep in mind that if you want to catch faults on Endpoints then you have to set handleFault(true) on the route.  Otherwise exceptions thrown by endpoints will not be passed to Camel ErrorHandlers.  In Talend Studio you can do this with the cJavaCSLProcessor.  Just put it right after your Camel consumer endpoint that starts the route and use the ".handleFault()" (without the quotes) as the content of the component.
For Missed Trigger I am surprised that either the Timer or the camel-quartz component would miss a trigger event.  One thing I would suggest in this case is to keep the trigger event as simple as possible.  The trigger should just put the command message onto the JMS queue.
Regarding JMS, sounds like you have a good understanding and framework in place.  So I would stick with your current approach and just implement it in Camel.  So for example, for the Missed Trigger case have the timer put the command message onto the MessageQueue.  There can be multiple consumers consuming from the queue, and those consumers could be on any of the 4 servers you mention.
In most cases a command message should be pretty stateless.  All of the parameters should be in the message, so if they are not there already the message should be enriched or you can use the claim check pattern if you prefer and then have the consuming endpoint retrieve the relevant data.
The wiretap is a legitimate approach, but it is invasive and messy.  You clutter up your business and workflow logic with management and error paths.  There are a few ways to make this more a elegant.  First, lets start with the JMS piece.  Talend comes with ActiveMQ of course, so take a look at composite destinations here 
Composite destinations are nice because your actual Camel route does not need to know that a message was sent to two destinations.  It is entirely encapsulated within ActiveMQ.  So book-keeping like message correlation done by your custom monitoring tool can be decoupled from your routes.
Applying Virtual Composite Destinations to use cases #2 and #3, the composite destinations allow complete decoupling of the monitoring routines, but of course they need to have some knowledge of the metadata for the routes, e.g. when was the route scheduled to start, when was it expected to end by.  
Depending on your design, virtual composite destinations might be sufficient for your needs.  You can combine it with JMS transactional behavior.  So if one of your ESB nodes is processing a message and the ESB node crashes, well the transaction will roll back onto the message queue.  It will then be re-processed by another of the remaining 3 servers in your cluster.  Since it is stateless the new server has all of the information necessary.
If you want to do more fine-grained monitoring within your route in a non-invasive manner, then look at using Interceptors to apply wiretaps to the relevant Camel Channels.  You can do this using the cIntercept component in Studio.  Note that you can apply interceptors to specific endpoints by filtering on the Exchange.INTERCEPTED_ENDPOINT.