We are getting a daily file in UTF-8 BOM encoding because of which our Talend ETL Job always misses the first row of the file
Sample Data in File:
P, 1234, $10
R, 1236, $15
Our actual flow is like
tFileList ==>> tFileInputDelimited ==>> fReplicate ==> tFilterRow ==> tMSSqlSCD
Actually tFileInputDilimited is able to process all rows but when we use tFilterRow, but it always misses first row of every particular file
The condition for tFilterRow is column0 Equals "P"
When we configured tLogRow we found few special characters prefixed with the first rows of all files. Example ???P
Also when we opened our CSV files in Notepad++ we discovered that File is encoded in UTF-8-BOM
We have option only for UTF-8 in Advanced settings of tfiledilimited
Let us know how can we process UTF-8-BOM file using Talend job
Thanks & Regards
So far, talend tfileinputdelimited component uses "UTF-8" without BOM. There is an option "Custom" in Encoding part.
Could you please try it to see if it works?
We are not able to process UTF-8 BOM file.When we run a job of 10 file,every time it skips the first row of every file.We are waiting for talend team to respond to our issue.
Talend uses "UTF-8" without BOM. A UTF-8 BOM encoded file contains a three-byte pattern (0xEF 0xBB 0xBF) in the prolog, that is probably not parsed successfully by the tFileInputDelimited component.
Have you already checked tChangFileEncoding component to see if it works?