Breaking up Hive query extract into multiple files?

One Star

Breaking up Hive query extract into multiple files?

I am using a Hive query as my source and the result written in a file on HDFS (7.7GB). My aim is to move this into S3 but there is a file limitation of 5GB on S3.
Is there a way for me to break up this file into multiple chunks?
tHDFSConnection --> tHiveConnection --> tHiveInput --> tMap --> tHDFSOutput
 
Employee

Re: Breaking up Hive query extract into multiple files?

Hello,
The output of tHDFSOutput is a single 7.7 GB file ? Are you executing the job on a Hadoop cluster ?
You can take a look at the tELTHive components (tELTHiveInput, tELTHiveMap, tELTHiveOutput) (), the output will be written to a Hive table but the whole job will be executed on cluster. If you're using a cluster with multiple machines, this would generate separate partition files that you can then move to S3.

Re: Breaking up Hive query extract into multiple files?

Hi 
Which component among "tHive / TELTHive " is better? and please suggest where should we go with thive component and when should we go for tELTHive component. 

What’s New for Talend Spring ’19

Watch the recorded webinar!

Watch Now

Agile Data lakes & Analytics

Accelerate your data lake projects with an agile approach

Watch

Definitive Guide to Data Quality

Create systems and workflow to manage clean data ingestion and data transformation.

Download

Tutorial

Introduction to Talend Open Studio for Data Integration.

Watch