Six Stars

Multiple empty files are created when loading data into HDFS using spark


I have group of messages in queue and they are consumed by consumer and get latest record among using spark streaming job and loaded into HDFSCapture.PNG



1. Wanted to save data into a file as .csv but some number pattern is added to file name which is given in tfileOutput component




Example: give below i wanted to save data in maindata.csv but it is creating maindata.csv-1522775132000 folder and saving data in that folder


2. Creating 14 empty partitions files and inserting data into 15 partition file


Expected Output:

1. Can i insert data into maindata.csv ??

2. Can i determinate partitions according to data ??


Thanks in advance!!