Hi, I am new to Talend and have a requirement to read the data from JMS Queue and post it into Hadoop (Hive or HBase) using Talend. Can you please point me to any related sample tutorials available if not can you please guide me with steps on further development? Currently I installed TOS_BD-20150702_1326-V6.0.0 . Do I need to install Talend Open Studio for ESB or Talend Enterprise ESB Studio as mentioned in the tutorial "How to create mediation route". Please do provide me your valuable suggestions. Thanks, Sandeep
Talend ESB is for real-time data or application integration - where you would be creating services that are always on. I suspect you want to read a queue and post that data to Hadoop? You can use the BD (Big Data) TOS that you downloaded to do this. You'd use the jmsInput to read the queue (may also look at using tMomInput if tJMSinput does not work), parse the data - may be XML? using one of the XML components, then load the data to Hadoop. Because queues are always on, you can set your job to always listen (tMomInput) - read repeatedly and drain to Hadoop. In that case, you're probably better off draining the queue and writing directly to Hive (to a staging table or final). As you'll see below, loading directly to a hive table would mean writing to a file (in order to use tHiveLoad). Whereas, if you know the HDFS directory of the Hive table (usually in /user/hive/warehouse/dbname.db/tablename), you can 'stream' the data end to end without having to write to disk en route. If this is a new project, you probably also want to look into new ways of doing this - kafka, flume and spark streaming - all of which are supported in latest versions of Talend. These are the new and better ways of 'streaming' data to Hadoop.