I have a CSV file with raw data and I'm trying to load it into Hive table that uses the Parquet format. I found a way to do this but I was wondering if there is an easier way to do it which would only require 1 single job.
Here's how I did it:
- a Big Data Batch job which reads the CSV file from HDFS (tFileInputDelimited) and outputs it as a Parquet file (tFileOutputParquet)
- a Standard job with just the tHiveLoad component which reads the Parquet file and loads it into the Hive table
My question is: is there a way to do this in 1 single job?
I tested it but I get an error "PartialGroupNameException Does not support partial group name resolution on Windows. Incorrect command line arguments."
Any clue what this means?
Talend named a Leader.
Kickstart your first data integration and ETL projects.
Watch the recorded webinar!
Learn how to make your data more available, reduce costs and cut your build time
Read about OTTO's experiences with Big Data and Personalized Experiences
Take a look at this video about Talend Integration with Databricks