Load a CSV file into Hive Parquet table

Four Stars

Load a CSV file into Hive Parquet table

Hello,

 

I have a CSV file with raw data and I'm trying to load it into Hive table that uses the Parquet format. I found a way to do this but I was wondering if there is an easier way to do it which would only require 1 single job.

 

Here's how I did it:

- a Big Data Batch job which reads the CSV file from HDFS (tFileInputDelimited) and outputs it as a Parquet file (tFileOutputParquet)

- a Standard job with just the tHiveLoad component which reads the Parquet file and loads it into the Hive table

 

My question is: is there a way to do this in 1 single job?

 

Many thanks,

Axel

Highlighted
Thirteen Stars

Re: Load a CSV file into Hive Parquet table

Hi Axel

 

what wrong with tHiveOutput ?

 

regards, Vlad

-----------
Four Stars

Re: Load a CSV file into Hive Parquet table

Hi Vlad, thanks for your reply. Are you saying that it should work fine if I connect tFileInputDelimited to tHiveOutput if I want the Hive table in Parquet format? Sorry, I'm fairly new to Talend.
Thirteen Stars

Re: Load a CSV file into Hive Parquet table

Why just not test? Smiley Happy
It support parquet format
-----------
Four Stars

Re: Load a CSV file into Hive Parquet table

I tested it but I get an error "PartialGroupNameException Does not support partial group name resolution on Windows. Incorrect command line arguments."

 

Any clue what this means?

Tutorial

Introduction to Talend Open Studio for Data Integration.

Definitive Guide to Data Integration

Practical steps to developing your data integration strategy.

Definitive Guide to Data Quality

Create systems and workflow to manage clean data ingestion and data transformation.