tFileInputDelimited - Columns(s) missing Issue

One Star

tFileInputDelimited - Columns(s) missing Issue

I have a case for each input file, the process steps is as follows:
1. validate the input file with schema and ouput invalid records output into ***.invalid file
2. filter valid file records with filter criteria and ouput filtered records into ***.filtered file
3. all remaining records will be ouput into another directory with same file name as input
And now it has issue with step 1, while ok for step 2 and step3.
the error "Columns(s) missing" displayed in ouput ***.invalid file
sample ***.invalid file is as follows
------------------------
21|3|1|xxxxxxxx|422021200801087|xxxxxxxx|xxx|20120213064647|0|||||||||||||||||||||Column(s) missing - Line: 9503
------------------------
while checked the original input file by search values above altogether, only following record matched as below
------------------------
21|3|1|xxxxxxxx|422021200801087|xxxxxxxx|xxx|20120213064647|0|0|422|02|0|188.140.138.180|0|0|18||taif|2|1|1|ip|ip|1612108893||source_file_name|+timezone
------------------------
and the schema used with the same number of columns as above records.
So my quesion is
1.where the problem is that cause this issue?
2.what is the logic for "check each row structure against schema" and "trim all column"?
3.why not the complete original record output into invalid file?
4.all ouput of invalid file with only one record, do I need check "Append" in correponding tFileOutputDelimited_3?
5.What this append used for? is it used for different files append into same ouptut? in my case I would like each file has seperate invalid files.
BTW, I run the same input file again in local environment, then no invalid file created at all.
One Star

Re: tFileInputDelimited - Columns(s) missing Issue

Hi
1. 3. Column(s) missing means the number of columns in input file is against the schema of tFileInputdelimited. Count Field Separator in Line 9503. Some separators are missing.
2. "check each row structure against schema" It is used to check the number of columns and data type of input data.
"trim all column". For example, one column in delimited file is " abc ". After you check "trim all column", it will be loaded into Talend as "abc". All
spaces will be ignored.
4.5. There is no need to check "append" here. It means Talend will continue writing data from the tail of one delimited file instead of overwriting it from the beginning.
In fact, you can get more info about tFileInputDelimited in the TOS document.

Regards,
Pedro
One Star

Re: tFileInputDelimited - Columns(s) missing Issue

Hi Pedro,
Thanks for you reponse.
I might have not make myself understood.
the output to the *.invalid file is listed below
------------------------
21|3|1|xxxxxxxx|422021200801087|xxxxxxxx|xxx|20120213064647|0|||||||||||||||||||||Column(s) missing - Line: 9503
------------------------
While in the original input file, no record like that. I only find following complete record in original file
------------------------
21|3|1|xxxxxxxx|422021200801087|xxxxxxxx|xxx|20120213064647|0|0|422|02|0|188.140.138.180|0|0|18||taif|2|1|1|ip|ip|1612108893||source_file_name|+timezone
------------------------
and the schema used with the same number of columns as above records.
As you explained "check each row structure against schema" It is used to check the number of columns and data type of input data
One Star

Re: tFileInputDelimited - Columns(s) missing Issue

Hi
21|3|1|xxxxxxxx|422021200801087|xxxxxxxx|xxx|20120213064647|0|||||||||||||||||||||
21|3|1|xxxxxxxx|422021200801087|xxxxxxxx|xxx|20120213064647|0|0|422|02|0|188.140.138.180|0|0|18||taif|2|1|1|ip|ip|1612108893||source_file_name|+timezone
They are not the same. The first line contains 30 columns, while the second line contains 28.
The job is correct.
Regards,
Pedro
One Star

Re: tFileInputDelimited - Columns(s) missing Issue

Hi Pedro,
Thanks.
After further investigation, it might caused by process file that has not complete processing from previouse step.
Say the file is large,
1. it needs time to sftp from remote to local directory
2. before finished transfer to local directory, the tWaitForFile pickup this file and doing further process.
Do you know how to control tWaitForFile not process it until it finished transfering?
Or other way to avoid this?
One Star

Re: tFileInputDelimited - Columns(s) missing Issue

Hi
Got it. But...
Let's do a test for the file before we confirm this is an issue about tWaitforFile.
tFileInputDelimited --Reject-->tLogRow
Use tFileInputDelimited to load the file.
Does the job still get a reject row?
Regards,
Pedro
One Star

Re: tFileInputDelimited - Columns(s) missing Issue

Please check my latest post
http://www.talendforge.org/forum/viewtopic.php?id=22122
which with experiment jobs on large files