One Star

Breaking up Hive query extract into multiple files?

I am using a Hive query as my source and the result written in a file on HDFS (7.7GB). My aim is to move this into S3 but there is a file limitation of 5GB on S3.
Is there a way for me to break up this file into multiple chunks?
tHDFSConnection --> tHiveConnection --> tHiveInput --> tMap --> tHDFSOutput
 
2 REPLIES
Employee

Re: Breaking up Hive query extract into multiple files?

Hello,
The output of tHDFSOutput is a single 7.7 GB file ? Are you executing the job on a Hadoop cluster ?
You can take a look at the tELTHive components (tELTHiveInput, tELTHiveMap, tELTHiveOutput) (), the output will be written to a Hive table but the whole job will be executed on cluster. If you're using a cluster with multiple machines, this would generate separate partition files that you can then move to S3.

Re: Breaking up Hive query extract into multiple files?

Hi 
Which component among "tHive / TELTHive " is better? and please suggest where should we go with thive component and when should we go for tELTHive component.