One Star

[resolved] tfilelist, check header row and match up columns

Hello,
I have a directory (unix) which has a number of files. I can extract all, for example, "filetype1.csv" using regex.
However some of these files of the same type have different headers and different numbers of columns which don't always match up or come in the same order.
I'm currently just going from tfilelist to a delimited file input. I would like to either
a) split it coming out of the directory based on what the header is (i.e if header matches regex1 then put into inputfile1, if header matches regex2 put into inputfile2), so the columns all match up in the merged file. Or...
b) just extract certain columns from the files before putting them altogether in the delimited input file, so I again the combination of all the inputs match up in terms of column headings.
Is there anyway to do this without writing custom java code to do it all?
Thanks
P
1 ACCEPTED SOLUTION

Accepted Solutions

Re: [resolved] tfilelist, check header row and match up columns

for (a) there is a non-custom solution:
1) read the header from each file (in the schema for your input, have a single column to hold the whole line)
2) use a tMap to run your regex in the tMap output filter-- you will have one output table per target file. (this part would be cleaner with a tJavaRow)
3) using an "if" link, read the input file with the correct input component.
if you need more details, please ask... I can work up an example
4 REPLIES

Re: [resolved] tfilelist, check header row and match up columns

for (a) there is a non-custom solution:
1) read the header from each file (in the schema for your input, have a single column to hold the whole line)
2) use a tMap to run your regex in the tMap output filter-- you will have one output table per target file. (this part would be cleaner with a tJavaRow)
3) using an "if" link, read the input file with the correct input component.
if you need more details, please ask... I can work up an example
One Star

Re: [resolved] tfilelist, check header row and match up columns

Hi,
thanks very much for your message, sounds like a sensible solution. I think I have an idea of how to do the below... but if you could give me an example that would be really great.
I've done a fair bit of java, so am quite happy to write some custom code (using tJavaRow instead of tmap if it makes more sense as you suggest)... it's just that i've never done java in Talend and am not quite sure how to start without some example code before.
Thank you very much
P

Re: [resolved] tfilelist, check header row and match up columns

Just a note before you implement: I forgot that the tFileInputMSDelimited may make this much simpler. It is designed to work with single multischema files, but it may work for this problem. If it does, it would be as simple as:
tFileList
|
tFileInputMSDelimited--file1-->(rest of job for file 1)
|--file2-->(rest of job for file 2)

Here's the original solution I envisioned.
tFileList
|
iterate
|
tFileInputDelimited-row->tJavaRow--if-->tFileInputDelimited --> (rest of job for file 1)
|--if-->tFileInputDelimited --> (rest of job for file 2)

in the first tFileInputDelimited, set it up to read one row into a single column. (by setting the limit to 1 and the field separator to "")
in the tJavaRow, set a context variable to the name of the file you want to run based on your regex logic.
i.e.
if( input_row.header_line.matches("some crazy regex" )  ) {
context.file_to_run = "file_1";
}

in the if links, check this variable to execute the correct file processing flow. i.e. :
context.file_to_run.equals("file_1")
One Star

Re: [resolved] tfilelist, check header row and match up columns

Hi John,
Thanks for your help,
P