Open Studio for DQ can not handle special characters in CSV-File encoded as utf-8

Highlighted
Four Stars

Open Studio for DQ can not handle special characters in CSV-File encoded as utf-8

Hi, 

 

I'm using the Talend Open Studio for Data Quality Version 6.5.1 to analyze the quality of data in a csv file which is encoded in UTF-8. If I select the indicator 'Soundex Frequency' for a column which values contains special characters like "ü" and "é" and run the analysis I get the following error message: 

 

 

2018-05-04 17:14:20,232 ERROR org.talend.dq.analysis.AnalysisExecutor  - java.lang.IllegalArgumentException: The character is not mapped: Ü
java.lang.IllegalArgumentException: The character is not mapped: Ü
	at org.apache.commons.codec.language.Soundex.map(Soundex.java:226)
	at org.apache.commons.codec.language.Soundex.getMappingCode(Soundex.java:180)
	at org.apache.commons.codec.language.Soundex.soundex(Soundex.java:264)
	at org.talend.dataquality.indicators.impl.SoundexFreqIndicatorImpl.handle(SoundexFreqIndicatorImpl.java:283)
	at org.talend.dq.indicators.DelimitedFileIndicatorEvaluator.handleByARow(DelimitedFileIndicatorEvaluator.java:335)
	at org.talend.dq.indicators.DelimitedFileIndicatorEvaluator.useCsvReader(DelimitedFileIndicatorEvaluator.java:257)
	at org.talend.dq.indicators.DelimitedFileIndicatorEvaluator.executeSqlQuery(DelimitedFileIndicatorEvaluator.java:115)
	at org.talend.dq.indicators.Evaluator.evaluateIndicators(Evaluator.java:146)
	at org.talend.dq.indicators.Evaluator.evaluateIndicators(Evaluator.java:207)
	at org.talend.dq.analysis.DelimitedFileAnalysisExecutor.runAnalysis(DelimitedFileAnalysisExecutor.java:70)
	at org.talend.dq.analysis.AnalysisExecutor.execute(AnalysisExecutor.java:146)
	at org.talend.dq.analysis.AnalysisExecutorSelector.executeAnalysis(AnalysisExecutorSelector.java:171)
	at org.talend.dataprofiler.core.ui.action.actions.RunAnalysisAction$1.runInWorkspace(RunAnalysisAction.java:222)
	at org.eclipse.core.internal.resources.InternalWorkspaceJob.run(InternalWorkspaceJob.java:38)
	at org.eclipse.core.internal.jobs.Worker.run(Worker.java:54)

 

I've already tried to solve the problem by the solution of this post: https://community.talend.com/t5/Design-and-Development/Handling-special-characters/m-p/25169#M4268

and I checked "Allow specific characters (UTF8,...) for columns of schemas" under Window / Preferences / Talend / Specific Settings.

Neither of the solutions worked for me. 

 

Is there any workaround to solve the problem?

 

Thanks in advance

Frank

 

Highlighted
Moderator

Re: Open Studio for DQ can not handle special characters in CSV-File encoded as utf-8

Hello,

Have you tried to add -Dfile.encoding=utf8 in the ini (config file) and restart your studio to see if it works?

Best regards

Sabrina

--
Don't forget to give kudos when a reply is helpful and click Accept the solution when you think you're good with it.
Highlighted
Employee

Re: Open Studio for DQ can not handle special characters in CSV-File encoded as utf-8

hi
we don't support that the indicator 'Soundex Frequency' to run
for a column which values contains special characters like "ü" and "é" and Chinese/Japanese characters.
get this error is normal, we will not fix this

2019 GARTNER MAGIC QUADRANT FOR DATA INTEGRATION TOOL

Talend named a Leader.

Get your copy

OPEN STUDIO FOR DATA INTEGRATION

Kickstart your first data integration and ETL projects.

Download now

Enabling Data Governance

Learn how to enable Data Governance

Watch Now

The Definitive Guide to Government Data Quality

Take a peek at the definitive guide to Government Data Quality

Read