One Star

Splitting up into multiple CSV files based on column values then FTP

Hi,
I can see there's an option to split files up based on row counts, is it possible to split based on value(s) in a column (varchar, dynamic list driven from table)?
These values then will appear in file name also - ie. YYMMDD_RED.csv, YYMMDD_YELLOW.csv, YYMMDD_GREEN.csv - assuming colour is my column here that i want to split records into separate files, many records being red, yellow, and green.
As an aside, previous attempts to FTP I used FTP repository to specify credentials to put files on FTP. Is it possible to set this via contexts? - which can be set by values at database/file. I was slightly bemused when it came to password, as it didn't seem feasible to type in a context variable in here.
Regards,
John
11 REPLIES
Moderator

Re: Splitting up into multiple CSV files based on column values then FTP

Hi,
These values then will appear in file name also - ie. YYMMDD_RED.csv, YYMMDD_YELLOW.csv, YYMMDD_GREEN.csv - assuming colour is my column here that i want to split records into separate files, many records being red, yellow, and green.


Could you please elaborate your case with an example with input and expected output values?
As an aside, previous attempts to FTP I used FTP repository to specify credentials to put files on FTP. Is it possible to set this via contexts? - which can be set by values at database/file. I was slightly bemused when it came to password, as it didn't seem feasible to type in a context variable in here.

Do you want to mask your password on FTP connection? Please take a look at a related scenario in component reference:
TalendHelpCenter:tContextLoad
Best regards
Sabrina
--
Don't forget to give kudos when a reply is helpful and click Accept the solution when you think you're good with it.
One Star

Re: Splitting up into multiple CSV files based on column values then FTP

I want to use contexts to set the FTPGet component.
The other example was an example, not using real data but can be applied universally.
Imagine this as a table
Car number   Colour
1                 Red
2                 Yellow
3                 Yellow
4                 Green
5                 Green
So car number 1 should appear in red file.
2&3 for yellow file.
4&5 in Green file.
Imagine colour is a dynamic list that could change, driven by table data - hence want to split files dynamically based on column values
Many thanks in advance.
One Star

Re: Splitting up into multiple CSV files based on column values then FTP

Hi,
I have a huge csv file with date column(dd-mm-yyyy). 
I would like to generate multiple files for each year.  For example all 2016 data has to store in 2016 folder and 2015 year data has to store in 2015 folder automatically and so on for all the years.
It is urgent...PLz help me on this.
Regards,
Asha
One Star

Re: Splitting up into multiple CSV files based on column values then FTP

The easiest way is to open n times your file, not very efficient bur pretty simple
1)you get all the years, uniq those years, then you iterate (tflowtoiterate)
At this point you have an iteration for each different year in your csv
2) you put the year in a variable, then you reopen your file and filter on the year you need, and when you write on your files, you just have to put your variable in the file path. 
could be something like

tFileInputDelimited
tUniqRow
tFlowToIterate
Tjava or TSetGlobalVar
tFileInputDelimited
tFilterRow
tFileOutputDelimited
If you have anyquestion let me know
Fifteen Stars

Re: Splitting up into multiple CSV files based on column values then FTP

This tutorial demonstrates how to do this with a simple bit of code
https://www.rilhia.com/tutorials/load-data-dynamic-number-files
Rilhia Solutions
One Star

Re: Splitting up into multiple CSV files based on column values then FTP

Hi,
Can anyone explain how to merge different schema csv files into a single csv file?
I have 4 different files with different headers and one column is common in 4 files. I want all four files data into one with all distinct columns.
Regards,
Asha
Fifteen Stars

Re: Splitting up into multiple CSV files based on column values then FTP

This isn't really related to the original question. However you can do this with a tMap. The column that is common between all of the csv files should be used as the join. 
Rilhia Solutions
One Star

Re: Splitting up into multiple CSV files based on column values then FTP

Hi rhall,
Thank you for your quick response.
Is there any other alternative without using tmap component can we join all csv files as a single file?
Because i need all files data  in a single file/ If I join all files then it will give only matched columns.
Regards,
Asha
Fifteen Stars

Re: Splitting up into multiple CSV files based on column values then FTP

There is an easy solution to this, but maybe not the most efficient (It would be easier to solve with access to the files). If you know that some rows will not match in some files, but you want them all, then you could add a pre-processing step to create another file with all of the keys from all files. Then use that file as your driving (Main) file input and join in all of the others. 
Rilhia Solutions
One Star

Re: Splitting up into multiple CSV files based on column values then FTP

Hi,
Can any one help me on this.
I am getting  "Preview Error. Some settings must be changed".
Below is the detailed error:
Exception in thread "main" java.lang.Error: Unresolved compilation problems: 
com.talend cannot be resolved to a type
com.talend cannot be resolved to a type
com.talend.csv.CSVWriter cannot be resolved to a type
org.talend cannot be resolved to a type
org.talend cannot be resolved to a type
com.talend cannot be resolved to a type
com.talend cannot be resolved to a type

at bench1.shadowfileinputtodelimitedoutput_0_1.ShadowFileInputToDelimitedOutput.tFileInputDelimitedProcess(ShadowFileInputToDelimitedOutput.java:596)
at bench1.shadowfileinputtodelimitedoutput_0_1.ShadowFileInputToDelimitedOutput.runJobInTOS(ShadowFileInputToDelimitedOutput.java:1013)
at bench1.shadowfileinputtodelimitedoutput_0_1.ShadowFileInputToDelimitedOutput.main(ShadowFileInputToDelimitedOutput.java:894)

Preview error. Some settings must be changed.
Note: Preview errors are generally due to a wrong encoding setting.
org.talend.designer.runprocess.shadow.ShadowFilePreview.preview(ShadowFilePreview.java:90)
org.talend.metadata.managment.ui.utils.ShadowProcessHelper.getCsvArray(ShadowProcessHelper.java:383)
org.talend.repository.ui.wizards.metadata.connection.files.delimited.DelimitedFileStep2Form$PreviewProcessor.nonUIProcessInThread(DelimitedFileStep2Form.java:1309)
org.talend.commons.ui.swt.thread.SWTUIThreadProcessor$1.run(SWTUIThreadProcessor.java:74)

Kind Regards,
Asha
One Star

Re: Splitting up into multiple CSV files based on column values then FTP

Hi,
How to filter data dynamically during run time based on column and it's value?(I will give column name and it's value during batch file execution)
Can any one share some suggestions on this?
Regards,
Asha