Hello, I am trying to create a job which does following. 1. Fetch Newly Created file from Local Directory 2. Copies this file into HDFS using tHDFSPut 3. Connect to Hive 4. Create table with same as file name into Hive with Pre-Defined csv equivalent structure using tHiveCreateTable. 5. Load data from HDFS csv into Hive using tHiveLoad Upto this point I am successful. Now I have to move same data into MySQL in serial mode only at the same time after querying Newly Created Table in Hive. I have tried putting tHiveInput but I am not able to connect as outlink of tHiveLoad by any means i.e. not able to connect it using "iterate" or OnSubJobOK or OnComponentOk. How can I accomplish this?
S/W Inventory in Use: 1. Talend for BigData (Community Edition) v6.0.0 2. Cloudera CDH 5.4 sandbox - using Oracle Virtual Box 3. Ubuntu(14.04) as Machine OS Please advice.
That is not a proper way of doing it, you need to put all the data to HDFS then use tHiveLoad to load into desire database, but if you want to load data into MySQL then you can directly connect tHiveInput---main--tMySQLoutput component or use tSqoopExport component. never tried but you can test it.
Hi Umesh, I am trying to create a serialized job. So once data goes into Hive, I have to read it from there do some transformation and load it into MySQL. The main problem is, I am not able to connect tHiveLoad-->>OnSubJobOK-->>tHiveInput. Is it possible to do it so? The option you have suggested is working fine in another independent job, but How to incorporate this in my current job? Regards, Vicky