tsqoopimport : how to merge generated files

Five Stars

tsqoopimport : how to merge generated files

hello,

I use the TSqoopimport component for importing  oracle tables to HDFS.

This component generate 4 files (part-m-xxxx) if i configure it with 4 mappers.

and after, how can i merge thos 4 files into one file ?

 

I use TOS for Big Data 6.1.1


Accepted Solutions
Five Stars

Re: tsqoopimport : how to merge generated files

Ok, i've found : the thdfscopy as an option which can merge the files.


All Replies
Moderator

Re: tsqoopimport : how to merge generated files

Hello,

The source data comes from 2 different sources, but has the same schema?

You can use a tFileList to iterate on a tFileInput* row linked to a tFileOutputDelimited in append mode.

Let us know if it works.

Best regards

Sabrina

--
Don't forget to give kudos when a reply is helpful and click Accept the solution when you think you're good with it.
Eight Stars

Re: tsqoopimport : how to merge generated files

If the data isn't huge, you can try configuring Sqoop to just use one mapper: that way, it will generate one file.

 

If you don't want to go that route, Sabrina is mostly correct, except that you'll need tHDFSFileList to iterate over the files. Instead of merging them, this will iterate over them, so you can do whatever ETL work you need to do.

 

David

Five Stars

Re: tsqoopimport : how to merge generated files

Ok, i've found : the thdfscopy as an option which can merge the files.

What’s New for Talend Spring ’19

Watch the recorded webinar!

Watch Now

Definitive Guide to Data Quality

Create systems and workflow to manage clean data ingestion and data transformation.

Download

Tutorial

Introduction to Talend Open Studio for Data Integration.

Watch

Downloads and Trials

Test drive Talend's enterprise products.

Downloads