I need to load csv file into hive tables so I can write complicated logic in the hive query instead of expressions in the Talend.
Do I have to follow these steps? I did not test it yet.
1.tHiveCreateTable can be used to create a temp table
2.tFileInputDelimited (csv file)-> tHDFSOutput --store the csv in the HDFS system
3. then tHiveLoad --load the hdfs file to the hive table
4. use tHiveInput and get the data from the temp table
5. at the end of talend job, use tHiveRow to drop the temp table
If its not heavy duty production processes, suggest using WebHDFS REST api to upload data.
No packages, no jars, just need cUrl or tRestClient.
After loading the csv to the temp hive table, it needs to join the other hive tables.
So I prefer to complete everything in Talend. Thank you
Watch the recorded webinar!
Create systems and workflow to manage clean data ingestion and data transformation.
Introduction to Talend Open Studio for Data Integration.
Test drive Talend's enterprise products.