One Star

Encoding issue with tFileOutputXML

Hello,
I'm using a tFileOutputXML to write a simple XML file. I must use ISO-8859-1 as encoding, this works well if I set this as a Custom encoding in Advanced Options. But if there is a character outside ISO-8859-1 (for instance "?"), talend just outputs "?".
I expect talend to encode it to "& #8364;" (without space) : this is correctly decoded back to "?" when I use a tFileInputXML, why is this behavior not consistent ?
A workaround is to set UTF-8 encoding on the tFileoutputXML and then use a transformation to get the XML in the mandatory encoding.
Did anyone had the same issue ? Do you think a bug report/request for enhancement for this has any chance of getting some attention ?
Regards,
Eric
edit : I'm using talend 5.0.1
5 REPLIES
Community Manager

Re: Encoding issue with tFileOutputXML

Hi
You have to use UTF-8 to read or write the special character "?", I don't think you can read it correctly from file without utf-8 encoding.
----------------------------------------------------------
Talend | Data Agility for Modern Business
One Star

Re: Encoding issue with tFileOutputXML

? is not part of the 8859-1 character set and since can never be written to a file encoded in 8859-1.
I expect talend to encode it to "?"

What? How exactly do you want Talend to change that? As said, it is not part of the character set.
If you want ? either use 8859-15 change to UTF-8.
One Star

Re: Encoding issue with tFileOutputXML

Sorry, the forum broke everything. I'll edit my post : of course an ? in a file encoded in 8859-1 wouldn't be possible, what I meant is "& #8364;" without space.
I can't control the encoding, I'm writing this file for a legacy app. Even ISO-8859-15 would be enough but I simply can't.
One Star

Re: Encoding issue with tFileOutputXML

Ok, yes, that makes more sense. Talend has no built in way to do this. Best thing would be to set up a new routine and use this (not tested by me):
http://stackoverflow.com/questions/1273986/converting-utf-8-to-iso-8859-1-in-java
One Star

Re: Encoding issue with tFileOutputXML

Ok, yes, that makes more sense. Talend has no built in way to do this. Best thing would be to set up a new routine and use this (not tested by me):
http://stackoverflow.com/questions/1273986/converting-utf-8-to-iso-8859-1-in-java

Won't the & be encoded when I use the tFileOutputXML, ruining this improvised encoding ?