One Star

Change encoding in ESB route (UTF-8 to Windows-1252)

Hello everybody,
This is my first message on the forum Smiley Happy.
I want to convert a file from the "UTF-8" format to the "windows-1252"/"Cp1252" format in Talend route.
I tested the best solution for me.
- Start component: cFtp
I indicate the "utf-8" charset in the advanced options, the charset of my file.
- Middle component: cConvertBody
I indicate the class: "byte[].class, "Cp1252""
- End component: CFtp
I indicate the "Cp1252" charset, the encoding in which I want my file.
This method doesn't work and i'm little desperate. Do you have an idea to help me ?
Thank you in advance.
PS: I included in the attach documents, the options of my components.
   
4 REPLIES
Fifteen Stars

Re: Change encoding in ESB route (UTF-8 to Windows-1252)

I have not done this before, but was interested in your problem. I don't believe you will be able to do this in the way you are trying (however, I may be wrong). What I would attempt is making use of a cProcessor component and trying to do the conversion in Java.
Take a look at this site (http://java67.blogspot.co.uk/2015/05/how-to-convert-byte-array-to-string-in-java-example.html) for an example of how to convert a byte[] to a String of a particular encoding.
However, before you do that, you need to get the data as a byte[]. 
A byte is a primitive type in Java. It is not a class. Therefore your byte[].class conversion won't work. You need to convert the type to a String.class. Then the next component should be the cProcessor. Once in the cProcessor you can get hold of your data using code similar to below....
String myString = exchange.getIn().getBody(String.class);

You can then refer to the post below, to convert the String to a byte[] in the cProcessor.
http://stackoverflow.com/questions/18571223/how-to-convert-java-string-into-byte
Then use the post I gave in the first paragraph (http://java67.blogspot.co.uk/2015/05/how-to-convert-byte-array-to-string-in-java-example.html) to convert the encoding. 
Then use code very similar to below to put your newly converted String back into the body....
exchange.getIn().setBody(myConvertedString);

Then the next component *should* have the converted String in the message.
As I said, I have not tried this, but I suspect that this (or a slight variant on this) logic should work for you.
I'd be interested to hear if it does.
Rilhia Solutions
Fifteen Stars

Re: Change encoding in ESB route (UTF-8 to Windows-1252)

It turns out I may have been wrong in my assertion that you can't do what you want in the way you want.....although the way I suggested should work (....the long way around :-) ). 
Rilhia Solutions
One Star

Re: Change encoding in ESB route (UTF-8 to Windows-1252)

Hello rhall_2.0,
Thank you very much for you answer !
I tried with your method.
cFtp -> cConvertBody (String.class) -> cProcessor (look below) -> cFtp

After this route, the file without punctuations (I am French, punctuations is used) have the "ANSI as UTF-8" format but if I add an "é","è","à".... in the file, it have the "ANSI" format.
The format "ANSI as UTF-8" is (certainly) present because of the correspondence characters between the UTF-8 and the ANSI.
I have doubts about the solution, Do you believe that this is normal? Again thank you for your help
Fifteen Stars

Re: Change encoding in ESB route (UTF-8 to Windows-1252)

OK, I think we are nearly there. This is slightly more complicated than I had first thought. Take a look at the accepted answer here (http://stackoverflow.com/questions/28484064/windows-1252-to-utf-8). It seems to make sense. It is doing the reverse of what you are doing, but should be easy enough to get it to do what you want.
Rilhia Solutions