I have several ETL jobs that are running in Talend. Due to the amount data we would like to schedule these jobs in a Hadoop cluster. What kind of jobs can Talend schedule to Oozie/Hadoop? Can I just schedule my current job (a combination of MS SQL inputs, joined with tMap and exported to CSV for Google BigQuery with some Java code for transforms) run in the Oozie scheduler? Or do I need to rewrite my joins in Pig Latin? What would the best strategy be to get data from MS SQL into the Hadoop cluster using Talend?
Any Talend job that uses a big data connector, HDFS, Hive, Pig should be schedulable using the Oozie tab, provided the cluster info has been filled out. You would need to convert your joins to PigLatin or use the tPigMap component. The SQOOP components are probably the best for writing RDBMS tables to HDFS or Hive.