Four Stars

Open Studio for DQ can not handle special characters in CSV-File encoded as utf-8

Hi, 

 

I'm using the Talend Open Studio for Data Quality Version 6.5.1 to analyze the quality of data in a csv file which is encoded in UTF-8. If I select the indicator 'Soundex Frequency' for a column which values contains special characters like "ü" and "é" and run the analysis I get the following error message: 

 

 

2018-05-04 17:14:20,232 ERROR org.talend.dq.analysis.AnalysisExecutor  - java.lang.IllegalArgumentException: The character is not mapped: Ü
java.lang.IllegalArgumentException: The character is not mapped: Ü
	at org.apache.commons.codec.language.Soundex.map(Soundex.java:226)
	at org.apache.commons.codec.language.Soundex.getMappingCode(Soundex.java:180)
	at org.apache.commons.codec.language.Soundex.soundex(Soundex.java:264)
	at org.talend.dataquality.indicators.impl.SoundexFreqIndicatorImpl.handle(SoundexFreqIndicatorImpl.java:283)
	at org.talend.dq.indicators.DelimitedFileIndicatorEvaluator.handleByARow(DelimitedFileIndicatorEvaluator.java:335)
	at org.talend.dq.indicators.DelimitedFileIndicatorEvaluator.useCsvReader(DelimitedFileIndicatorEvaluator.java:257)
	at org.talend.dq.indicators.DelimitedFileIndicatorEvaluator.executeSqlQuery(DelimitedFileIndicatorEvaluator.java:115)
	at org.talend.dq.indicators.Evaluator.evaluateIndicators(Evaluator.java:146)
	at org.talend.dq.indicators.Evaluator.evaluateIndicators(Evaluator.java:207)
	at org.talend.dq.analysis.DelimitedFileAnalysisExecutor.runAnalysis(DelimitedFileAnalysisExecutor.java:70)
	at org.talend.dq.analysis.AnalysisExecutor.execute(AnalysisExecutor.java:146)
	at org.talend.dq.analysis.AnalysisExecutorSelector.executeAnalysis(AnalysisExecutorSelector.java:171)
	at org.talend.dataprofiler.core.ui.action.actions.RunAnalysisAction$1.runInWorkspace(RunAnalysisAction.java:222)
	at org.eclipse.core.internal.resources.InternalWorkspaceJob.run(InternalWorkspaceJob.java:38)
	at org.eclipse.core.internal.jobs.Worker.run(Worker.java:54)

 

I've already tried to solve the problem by the solution of this post: https://community.talend.com/t5/Design-and-Development/Handling-special-characters/m-p/25169#M4268

and I checked "Allow specific characters (UTF8,...) for columns of schemas" under Window / Preferences / Talend / Specific Settings.

Neither of the solutions worked for me. 

 

Is there any workaround to solve the problem?

 

Thanks in advance

Frank

 

2 REPLIES
Moderator

Re: Open Studio for DQ can not handle special characters in CSV-File encoded as utf-8

Hello,

Have you tried to add -Dfile.encoding=utf8 in the ini (config file) and restart your studio to see if it works?

Best regards

Sabrina

--
Don't forget to give kudos when a reply is helpful and click Accept the solution when you think you're good with it.
Employee

Re: Open Studio for DQ can not handle special characters in CSV-File encoded as utf-8

hi
we don't support that the indicator 'Soundex Frequency' to run
for a column which values contains special characters like "ü" and "é" and Chinese/Japanese characters.
get this error is normal, we will not fix this