Using Talend ESB 6.4.1 (free version) and Talend DI 6.4.1 (free version).
I have ESB Route which is getting messages from a message queue (RabbitMQ).
It then calls a DI Job to process the messages retrieved from the Route.
Sadly the performance is poor when messages flow 1-by-1, without batching.
The Job commits them to the database which is slow when using 1 row insert/commit per invocation, I get ~10 messages per second stored in database.
I have tested custom Java JDBC program doing the same work against the database using batching and performance is much better (~1000-2000 messages per second).
I would like to somehow buffer the incoming messages data inside ESB Route (or inside invoked Job?) so that I only invoke the Job every N-messages - as a batch.
It does not appear possible to call Java code in ESB Route to create a shared object (such as Java HashMap or ArrayList) to buffer incoming data to be made available to Job.
Does anyone have any suggestions how to implement this solution to achieve "batched commit" performance ?
Could you please also post your job design screenshots on forum so that we can get more information about your job requirement and re-direct it to our ESB experts if need be?
thank you for your interest.
Requirement is to read messages from RabbitMQ and store them into PostgreSQL database with high performance, i.e. 1000+ messages per second sustained.
the design flow is very simple 2 step process using ESB Route + Job.
There is a Route which uses Camel to read messages from RabbitMQ.
The received message is then flowing via route to a Job which inserts the message (JSON) into a PostgreSQL database table as a single row.
The performance is poor due to single-row transaction, no batching.
What I need to do is to batch the incoming RabbitMQ messages into groups of (say) 1000 and send them to the database as a single batch followed by commit.
I also need to use a timer to commit received messages every N-seconds (for those times when not many messages arrive in groups, low period of activity).
I tried using fast job invocation which improved performance but still only about 10 messages per second is all I can achieve.
Talend DI Job has ability to batch (Batch Insert option for PostgreSQL) and it generates correct performant code (uses JDBC .addBatch() and .executeBatch() methods).
Alas the job only gets 1 message at a time from the Route so it can not use the batching effectively (batch size ends up = 1 row).
Just as a rough prototype, I hand coded a Java program which reads messages from RabbitMQ, one at a time, and uses same JDBC .addBatch() and .executeBatch() methods followed by database Commit per batch and I am easily getting >1000-2000 messages per second stored into PostgreSQL database.
I was thinking of employing a similar addBatch/executeBatch/Commit mechanism inside ESB Route but alas there is no way of calling a Java program from Route to do the data buffering. Buffering inside the Job appears to be problematic or at least very code intensive, particularly because I need to use a timer to commit any outstanding messages every N-seconds or 1000-messages - whichever occurs first.
I hope this clarifies my current situation.
You can use cThrottler to gather data and only send them every x data or x time to your job.
As JDBC is much performing you could try to get the transformed data back to ESB route then in a cProcessor/java bean use a JDBC library to connect to your DB and send it data. I already do it (in pure ESB) and it works fine.
Introduction to Talend Open Studio for Data Integration.
Practical steps to developing your data integration strategy.
Create systems and workflow to manage clean data ingestion and data transformation.