Hi, I have one use case where we have sensor data coming through RabbitMQ and we have to consume the data from the queue, do Real Time (Streaming) Data Processing from the consumed data, make some decisions based on that and finally storing data in Hadoop. Please suggest me how to implement this in Talend. Should I be using Spark components? If so how do I connect it to RabbitMQ to consume message?
HI, Sorry for the delay. Are you looking for a solution using our community tool (Talend Open Studio for ESB) or using our Enterprise scaling solutions? One of our experts suggests that you start with a standard ESB route (using cJMS) connected with a DI Job doing the decision making/storage in hadoop (DI components already provide the necessary tooling). However you may have to do some infrastructure tuning (especially on your RabbitMQ) and if after some testing the velocity isn't satisfactory, then you'll need to consider our dedicated Talend Real-Time Big Data Platform. Elisa
Hi Elisa, Thanks for the reply but our requirement is as below. We are using Big Data Sandbox trial version to check out the feasibility of Real Time Data analytics using Spark Big Data components before buying the same. There is huge streaming data coming from sensors of manufacturing plant into RabbitMQ . This huge data needs to be streamed to Spark for real time data processing and analysis wherein Spark should process the incoming data and store in Hadoop HDFS in streaming manner. Currently in the demo jobs available in Sandbox, Spark reads data from HDFS file and does processing but we need it in real-time using MQTT or AMQP protocols. Do we have any sample Spark jobs for real-time data processing available? Also, can we connect RabbitMQ to Spark? If yes, could you please share any sample job for the above requirements. Thanks in Advance.