One Star

Filtering lines in tFileInputDelimited component

Hi,
I need to read data from a CSV file where the header is printed multiple times on every Nth line. Ex:
ID VAL
A, 1
B, 2
C, 3
ID VAL
D, 4
E, 5
...
tFileInputDelimited component allows to skip the given number of header and footer lines, but then I get a format exception each time it fetches the column header line in the middle of the file. What is the most elegant way to suppress lines defined by some regex, like ID.* as in the given example?
Thank you!
--Alik
3 REPLIES
Community Manager

Re: Filtering lines in tFileInputDelimited component

Hello
The tSampleRow component can fit your need. With tSampleRow, you can choose a list of line numbers and/or a list of ranges.
see the screenshot.
Best regards

shong
----------------------------------------------------------
Talend | Data Agility for Modern Business
One Star

Re: Filtering lines in tFileInputDelimited component

Hello
The tSampleRow component can fit your need. With tSampleRow, you can choose a list of line numbers and/or a list of ranges.
see the screenshot.
Best regards

shong

Under what category can I find tSampleRow component? Also, I can't figure how this component should help. The format exceptions are thrown by tFileInputDelimited component when it tries to parse lines with column headers. I think that tFileInputDelimited component will keep throwing the exeptions even after tSampleRow component is connected to its output as was suggested by shong. Because the exceptions take place before tSampleRow component even has a chance to see data.
I can think of several ways how to resolve this problem in Kettle ETL. It is possible to specify a filter to its CSV file input component that will be applied each time before attempt to parse a line from the file. Also it is possivle to read file line by line by a generic file reader component, filter out unwanted rows by the second component and finally parse data with the third CSV row componet. Is there something like that available in Talend?
Thank you,
--Alik
One Star

Re: Filtering lines in tFileInputDelimited component

Hello
The tSampleRow component can fit your need. With tSampleRow, you can choose a list of line numbers and/or a list of ranges.
see the screenshot.
Best regards

shong

Another problem with the solution suggested by shong is that I need to know the line numbers that have to be filtered out. Unfortunately, it is not the case in my application. The size of the input file is not fixed and it is not possible to tell how many lines with column headers should be removed.
--Alik