Multiple empty files are created when loading data into HDFS using spark

Seven Stars

Multiple empty files are created when loading data into HDFS using spark

Task:

I have group of messages in queue and they are consumed by consumer and get latest record among using spark streaming job and loaded into HDFSCapture.PNG

 

Issue:

1. Wanted to save data into a file as .csv but some number pattern is added to file name which is given in tfileOutput component

 

Capture.PNG

  

Example: give below i wanted to save data in maindata.csv but it is creating maindata.csv-1522775132000 folder and saving data in that folder

Capture.PNG

2. Creating 14 empty partitions files and inserting data into 15 partition file

 

Expected Output:

1. Can i insert data into maindata.csv ??

2. Can i determinate partitions according to data ??

 

Thanks in advance!!

Employee

Re: Multiple empty files are created when loading data into HDFS using spark

One solution option for Issue-1 is to check the 'Merge result to single file' option in tFileOutputDelimited  component properties. Set the property 'Merge File Path' to your file path for maindata.csv. 

This creates a file with a name of your choice, in the path defined by you, with all the part- files data merged into one file. Optionally you could remove the source directory and/or override target file. 

 

Hope this helps.

2019 GARNER MAGIC QUADRANT FOR DATA INTEGRATION TOOL

Talend named a Leader.

Get your copy

OPEN STUDIO FOR DATA INTEGRATION

Kickstart your first data integration and ETL projects.

Download now

What’s New for Talend Summer ’19

Watch the recorded webinar!

Watch Now

Put Massive Amounts of Data to Work

Learn how to make your data more available, reduce costs and cut your build time

Watch Now

How OTTO Utilizes Big Data to Deliver Personalized Experiences

Read about OTTO's experiences with Big Data and Personalized Experiences

Blog

Talend Integration with Databricks

Take a look at this video about Talend Integration with Databricks

Watch Now