If I want to read a field of a positional file as a byte array, Talend is generating code like columnValuetFileInputPositional_1.getBytes(); I believe getBytes method, with out any arguments, will use default "Cp1252" encoding (JVM default) So, in order to read "ISO-8859-15" encoded files it should generate code like ...getBytes("ISO-8859-15") with charset name as an argument. Other solution would be, change the JVM default encoding. I know how to change this while running the job using "bat" file ( java -Dfile.encoding=ISO-8859-15 -Xms256M -Xmx1024M ...) But i dont know how to set JVM's Default encoding inside TALEND. I have tried setting window>preferences>General>workspace> Text file Encoding> to "ISO-8859-15". but this is not working as I am getting wrong results. I am getting correct result if I export the same job and run by changing the bat file ( java -Dfile.encoding=ISO-8859-15 -Xms256M -Xmx1024M ...) . Can you help me Thanks and Regards Sasi
I think there can be a bug in your job related with the encoding. (fixed for example in the lastest version) As you can see there is an encoding directly in the component (this one should be enough), I think you should try to switch this value of encoding. Maybe just like switch to UTF8 then switch back in ISO-8859-15, then run again the job. (nb: seems it needs to switch to built-in property mode before use the switch encoding in the version 2.1.1 (used for your bug report)) About the VMs argument, i think it will be added but as a feature. This will be added certainly in the preferences of TOS directly to avoid to interfer with the eclipse parameters.
I guess the problem is not with reading of file. TALEND reads the file correctly with correct encoding specified. intFileInputPositional_1 = new java.io.BufferedReader( new java.io.InputStreamReader(new java.io.FileInputStream(String) ((String) context.getProperty("datadir")) + (String) ((String) context.getProperty("sourceTableOrFileNames"))), "ISO-8859-15")); After this , I am reading a field as byte array. For this, talend first reads the field as a string and then calls .getBytes() method on that string to get byte array. I am not getting correct results with this. If i read input field as string and then call .getBytes("ISO-8859-15") method on that string manually, I am getting correct results.
Problem is getBytes() method without arguments, will use default encoding(Cp1252 on my windows). I have tried this on 2.2 version also and result is same. Talend starting JVM with default encoding irrespective of VM arguments. So the solution would be either we call .getBytes() method with correct encoding as argument or, start the JVM with the proper encoding. (If i export the same talend job, not working in TALEND, then change the bat file to have file.encoding VM argument and run the job , i am getting correct results.) So as a workaround, I am reading all the fields as strings and then calling .getBytes("ISO-8859-15") method manually. Correct me if I am wrong. Thanks and regards, Sasi