Six Stars

Multiple empty files are created when loading data into HDFS using spark

Task:

I have group of messages in queue and they are consumed by consumer and get latest record among using spark streaming job and loaded into HDFSCapture.PNG

 

Issue:

1. Wanted to save data into a file as .csv but some number pattern is added to file name which is given in tfileOutput component

 

Capture.PNG

  

Example: give below i wanted to save data in maindata.csv but it is creating maindata.csv-1522775132000 folder and saving data in that folder

Capture.PNG

2. Creating 14 empty partitions files and inserting data into 15 partition file

 

Expected Output:

1. Can i insert data into maindata.csv ??

2. Can i determinate partitions according to data ??

 

Thanks in advance!!

1 REPLY
Employee

Re: Multiple empty files are created when loading data into HDFS using spark

One solution option for Issue-1 is to check the 'Merge result to single file' option in tFileOutputDelimited  component properties. Set the property 'Merge File Path' to your file path for maindata.csv. 

This creates a file with a name of your choice, in the path defined by you, with all the part- files data merged into one file. Optionally you could remove the source directory and/or override target file. 

 

Hope this helps.