One Star

encoding problem with positional files : java

If I want to read a field of a positional file as a byte array, Talend is generating code like
columnValuetFileInputPositional_1.getBytes();
I believe getBytes method, with out any arguments, will use default "Cp1252" encoding (JVM default)
So, in order to read "ISO-8859-15" encoded files it should generate code like
...getBytes("ISO-8859-15") with charset name as an argument.
Other solution would be, change the JVM default encoding.
I know how to change this while running the job using "bat" file
( java -Dfile.encoding=ISO-8859-15 -Xms256M -Xmx1024M ...)
But i dont know how to set JVM's Default encoding inside TALEND.
I have tried setting window>preferences>General>workspace> Text file Encoding> to "ISO-8859-15". but this is not working as I am getting wrong results.
I am getting correct result if I export the same job and run by changing the bat file ( java -Dfile.encoding=ISO-8859-15 -Xms256M -Xmx1024M ...) .
Can you help me
Thanks and Regards
Sasi
6 REPLIES
One Star

Re: encoding problem with positional files : java

-Dfile.encoding=ISO-8859-15 VM argument is working with eclipse but not working with TALEND.
May i know why?
Community Manager

Re: encoding problem with positional files : java

Hi
I have tested a job as you said and found that it's a bug. Can you report a bug in our bugtracker please?
Thanks for your support!
Best regards
shong
----------------------------------------------------------
Talend | Data Agility for Modern Business
One Star

Re: encoding problem with positional files : java

I have created new bug request
http://www.talendforge.org/bugs/view.php?id=2349
thanks
Employee

Re: encoding problem with positional files : java

I think there can be a bug in your job related with the encoding. (fixed for example in the lastest version)
As you can see there is an encoding directly in the component (this one should be enough), I think you should try to switch this value of encoding. Maybe just like switch to UTF8 then switch back in ISO-8859-15, then run again the job.
(nb: seems it needs to switch to built-in property mode before use the switch encoding in the version 2.1.1 (used for your bug report))
About the VMs argument, i think it will be added but as a feature. This will be added certainly in the preferences of TOS directly to avoid to interfer with the eclipse parameters.
One Star

Re: encoding problem with positional files : java

I guess the problem is not with reading of file. TALEND reads the file correctly with correct encoding specified.
intFileInputPositional_1 = new java.io.BufferedReader(
new java.io.InputStreamReader(new java.io.FileInputStream(String) ((String) context.getProperty("datadir")) + (String) ((String) context.getProperty("sourceTableOrFileNames"))), "ISO-8859-15"));
After this , I am reading a field as byte array.
For this, talend first reads the field as a string and then calls .getBytes() method on that string to get byte array.
I am not getting correct results with this.
If i read input field as string and then call .getBytes("ISO-8859-15") method on that string manually, I am getting correct results.

Problem is getBytes() method without arguments, will use default encoding(Cp1252 on my windows).
I have tried this on 2.2 version also and result is same.
Talend starting JVM with default encoding irrespective of VM arguments.
So the solution would be either we call .getBytes() method with correct encoding as argument or,
start the JVM with the proper encoding.
(If i export the same talend job, not working in TALEND, then change the bat file to have file.encoding VM argument and run the job , i am getting correct results.)
So as a workaround, I am reading all the fields as strings and then calling .getBytes("ISO-8859-15") method manually.
Correct me if I am wrong.
Thanks and regards,
Sasi
Employee

Re: encoding problem with positional files : java

Sasi,
The good solution is too correct encoding used with the .getBytes() method to not affect others encoding options.
I will update your 2349.

Thanks for your support,