To process different File formats (.csv, .xls etc) as Source Input from a directory and to process in to single .csv file as Output.

Highlighted
Four Stars

To process different File formats (.csv, .xls etc) as Source Input from a directory and to process in to single .csv file as Output.

Hi 

 

I have different formated files in Source Directory -  Eg: *.csv, *.dat,*.xls. I need to process each file from directory individually and process it to a single file- *.csv and move to hadoop directory.

 

Example : Let as assume we have 3 files-  Pay_Slip_TV*.csv,Pay_Slip_Dish*.xls and Pay_Slip_Phone*.csv.

 

I need to pick each file individually from Source folder and process each file and combine together to one single - eg: Pay_Slip_Total.csv and copy to Hadoop directory.

 

Can anyone suggest method to do this process?

 

 


Accepted Solutions
Eleven Stars

Re: To process different File formats (.csv, .xls etc) as Source Input from a directory and to process in to single .csv file as Output.

Thanks Nitin, 

 

If all ok , mark the post as resolved. It helps others in community to find correct solution to their problem.

Regards
Abhishek KUMAR

View solution in original post


All Replies
Eleven Stars

Re: To process different File formats (.csv, .xls etc) as Source Input from a directory and to process in to single .csv file as Output.

This could be useful

 

https://community.talend.com/t5/Design-and-Development/merge-data-from-multiple-files-into-one/td-p/...

 

You need to create separate subjob for XLS and CSV. but output can be same file in (Append mode)

Regards
Abhishek KUMAR
Four Stars

Re: To process different File formats (.csv, .xls etc) as Source Input from a directory and to process in to single .csv file as Output.

 akumar2301 Thanks for the reply,

 

The output needs to be in same file, and that can be done by append mode.But while creating sub-job for XLS and CSV separately,how can we take each file individually and iterate through each sub-job. Do we need any java code (tJava)or can we do it with any other component in Talend?

 

tFileList -> tMap/tJava (for filtering the .csv or .xls files) -> Sub-Job ( for processing the .csv or .xls files) -> tMap (to do appending)--> tFileOutputDelimited (output)

 

Please correct me if I am wrong.

Eleven Stars

Re: To process different File formats (.csv, .xls etc) as Source Input from a directory and to process in to single .csv file as Output.

To do this

 

tFileList ------iterator-------- Tjava ( does nothing )

---RunIf 1 with below condition ----------- your CSV subjob

((String)globalMap.get("tFileList_1_CURRENT_FILEEXTENSION")).equalsIgnoreCase("csv")

 

---RunIf 2 with below condition ----------- your XLS subjob

((String)globalMap.get("tFileList_1_CURRENT_FILEEXTENSION")).equalsIgnoreCase("xls")

 

Change your condition according to your need.

Regards
Abhishek KUMAR
Four Stars

Re: To process different File formats (.csv, .xls etc) as Source Input from a directory and to process in to single .csv file as Output.

Thanks for correcting me 

 

This is working fine :-)

 

Eleven Stars

Re: To process different File formats (.csv, .xls etc) as Source Input from a directory and to process in to single .csv file as Output.

Thanks Nitin, 

 

If all ok , mark the post as resolved. It helps others in community to find correct solution to their problem.

Regards
Abhishek KUMAR

View solution in original post

2019 GARNER MAGIC QUADRANT FOR DATA INTEGRATION TOOL

Talend named a Leader.

Get your copy

OPEN STUDIO FOR DATA INTEGRATION

Kickstart your first data integration and ETL projects.

Download now

What’s New for Talend Summer ’19

Watch the recorded webinar!

Watch Now

Best Practices for Using Context Variables with Talend – Part 2

Part 2 of a series on Context Variables

Blog

Best Practices for Using Context Variables with Talend – Part 1

Learn how to do cool things with Context Variables

Blog

Migrate Data from one Database to another with one Job using the Dynamic Schema

Find out how to migrate from one database to another using the Dynamic schema

Blog