[resolved] tFileInputDelimited not reading international characters (UTF-8)

One Star

[resolved] tFileInputDelimited not reading international characters (UTF-8)

I am reading utf-8 encoded CSV text files, but am getting errors when reading the file with tFileInputdelimited. Once this is working, these will be saved (via a tmap -> tOracleOutput), to Oracle 11g. I am not sure if I then need to set advanced options on the tOracleOutput. The oracle db has been configured to store muti-byte characters.
Probably something simple I am missing.
I have attached screenshots.
Dave

Accepted Solutions
One Star

Re: [resolved] tFileInputDelimited not reading international characters (UTF-8)

So, here is what I did to resolve:
I brought up the source file in Firefox (file open). Then rt-click - View page info. This shows the character encoding. I saw that this file was actually encoded as UTF-16, not UTF-8, as I ad thought. I then changed the tFileinput to custom "UTF-16" in Talend, and it worked fine.
Dave

All Replies
One Star

Re: [resolved] tFileInputDelimited not reading international characters (UTF-8)

Hi
Try this. Set Encoding "Custom"->"GBK".
Regards,
Pedro
One Star

Re: [resolved] tFileInputDelimited not reading international characters (UTF-8)

thanks - I am investigating another issue with this. If that does not work, I will definitely try this. In any case, I will keep this post updated.
Thanks!
Dave
One Star

Re: [resolved] tFileInputDelimited not reading international characters (UTF-8)

"GBK" did not work. I have escalated this to Talend support.
thanks,
Dave
Community Manager

Re: [resolved] tFileInputDelimited not reading international characters (UTF-8)

Hi Dave
From the error message, we can see that it is a Number Format exception throws on tFileInputDelimited_2, one of columns is read using Integer/int data type. Try to change it to string data type.
Best regards
Shong
----------------------------------------------------------
Talend | Data Agility for Modern Business
One Star

Re: [resolved] tFileInputDelimited not reading international characters (UTF-8)

Shong,
Well, the first column is a short (Integer).
I changed the first column to a string in the FileInput, and added a tConvertType, after the FileInput. In the tConvertType, I convert the first column from string to short.
I now get a new error (new "Convert" "screenshots attached)
Dave
One Star

Re: [resolved] tFileInputDelimited not reading international characters (UTF-8)

Have you tried giving your ACD_No a size?
One Star

Re: [resolved] tFileInputDelimited not reading international characters (UTF-8)

I can. However, I think the issue however, is that Talend is not resolving UTF8 encoded data. In the screenshots, there are characters that Talend cannot resolve. I struggle with this however, as I cannot find any posts that also share this problem.
One Star

Re: [resolved] tFileInputDelimited not reading international characters (UTF-8)

So, here is what I did to resolve:
I brought up the source file in Firefox (file open). Then rt-click - View page info. This shows the character encoding. I saw that this file was actually encoded as UTF-16, not UTF-8, as I ad thought. I then changed the tFileinput to custom "UTF-16" in Talend, and it worked fine.
Dave