Will talent be able to differentiate Chinese and English characters

Highlighted
One Star

Will talent be able to differentiate Chinese and English characters

Hi Team,

I have data in a column which has got Chinese and English, Please let me know whether talend open studio for Data Quality tool will be intelligent enough to diffrentiate the things.. If it so any external things we have to put on .

Please help me




Thanks,
Narendar V
Employee

Re: Will talent be able to differentiate Chinese and English characters

Hi Narendar,

there is no problem to handle a mix of Chinese and English as long as they are encoded the same way.
You may need to specify an option in your database connection in order to retrieve correctly your characters.
One Star

Re: Will talent be able to differentiate Chinese and English characters

Thanks for your reply,

I am doing profiling activity and since it is multibusiness we have few customers by chinese language and few customers are in english

as below

Customer Name
===================
??????????????
??????????????
????????????
??????????????
??????????????
???????????
Yatron Engineering Co., Ltd.
SILUMIN-VOSTOK Ltd
??????????????
AEG Engineering Limited
Verizon Hong Kong LTD

Talend tool is able to recognize english characters but not chinese.. in case of chinese it is replacing the characteres with question marks (?) for each chinese character.

Any way to deal this multilanguage names with Talend Data quality

Any Support we have for chinese language using Talend


Thanks,
Narendar V
One Star

Re: Will talent be able to differentiate Chinese and English characters

I am getting data in the form of csv files and hence i am analyzing on that sample sheet, since talend 5.0 version has feasibility to extract metadata from files that has been good sign for us to go ahead with talend. But we got struck at the multilanguage issue.
Employee

Re: Will talent be able to differentiate Chinese and English characters

is your file correctly encoded in UTF-8?
And did you set the correct in the file connection? Beware that by default, the encoding is US-ASCII which is inappropriate here.

I think that Talend console in the DI perspective does not handle correctly the UTF-8 characters (I'm not sure though).
But if you are talking about this console, try to write data in a file instead.

Please, explain where the problem appears exactly.

2019 GARNER MAGIC QUADRANT FOR DATA INTEGRATION TOOL

Talend named a Leader.

Get your copy

OPEN STUDIO FOR DATA INTEGRATION

Kickstart your first data integration and ETL projects.

Download now

What’s New for Talend Summer ’19

Watch the recorded webinar!

Watch Now

Introduction to Talend Open Studio for Data Quality

Find out about Talend Open Studio for Data Quality

Watch Now

Enabling Data Governance

Learn how to enable Data Governance

Watch Now

The Definitive Guide to Government Data Quality

Take a peek at the definitive guide to Government Data Quality

Read