I have an issue with my job on production environment.Here is the scenario of my actual job.
Job reads data from a csv file using tInputDelimited component and writes it into another csv file using tOutputDelimited component and in between there is a tmap component which has 3 lookups connected to it.
And all the input and output file connectors has 'UTF-8' encoding format including look up components.
Input file has some accented characters in it but when it writes the data into target csv, all the special characters getting replaced with question marks ('?').
Input data: wöchentlich,árbol,único
Output data: w??chentlich, a?rbol, u?nico
Expected data: wöchentlich,árbol,único
After investigating the issue, I have found that it is because of a temp file which is getting generated internally to store data on disk when we enable store temp data option as true in tMap, especially for large data sets.
This temp file which is generating on linux machine is automatically getting converting from UTF-8 to the default system localle encoding format.
So my question is how can I change the temp file encoding format with in Talend? Any other suggestions or ideas to overcome this issue would be great helpful for me.
Thanks in advance,
Not sure it will work but you can try to force the default encoding for the job.
Go to Run tab, Advanced settings then add a specific JVM argument like this one:
Let us know.
Note: didn't saw you have already test this option with success
Talend named a Leader.
Kickstart your first data integration and ETL projects.
Watch the recorded webinar!
Pick up some tips and tricks with Context Variables
Learn how media organizations have achieved success with Data Integration
Create systems and workflow to manage clean data ingestion and data transformation.