Japanese Characters: ensuring proper data handling

Three Stars JPG
Three Stars

Japanese Characters: ensuring proper data handling



Talend newbie here; I was told to use Talend Open Studio for Data Integration as part of a current project. 


I've been searching the documentation and the forums for authoritative, all-encompassing guidance on this topic, but I've not been able to find it, so I'm hoping someone here can point me in the right direction. 



There are hundreds of Excel spreadsheets with data in Japanese (e.g., "レター対応"). I need to transform this data and output to CSV files for loading into other systems. What must I do to ensure the Japanese data is maintained and not garbled into something unintelligible, such as a bunch of question marks?



I'm sure that Talend can do this; the mere existence of these Japanese Discussion forums within the community is highly encouraging. But the suggestions I've seen are spread out throughout many forum postings, which leads me to think it will be easy for me to miss something. Any ideas?





PS. Here are the most promising posts that I've already read: 



Sixteen Stars

Re: Japanese Characters: ensuring proper data handling

If you simply want to take the characters and output to CSV (without translation) the big thing to focus on is ensuring the character encoding you are using is UTF-8. You can set this in a number of components (it may be in the advanced settings in some).

If you want to translate to another language, I'd recommend using Google's translation API (https://cloud.google.com/translate/docs/).


Introduction to Talend Open Studio for Data Integration.

Definitive Guide to Data Integration

Practical steps to developing your data integration strategy.

Definitive Guide to Data Quality

Create systems and workflow to manage clean data ingestion and data transformation.