Removing Special Characters

Six Stars

Removing Special Characters

Hi All,

 

I am currently trying to import data from a csv file with Chinese Characters and there are some � in the csv file..

 

My question is how do I import the data into teradata without it giving the "The string contains an untranslatable character," error message.

 

I would like to either remove the � or just replace it with an empty space.

 

Please help!

 

Thanks in advance.

 

 


Accepted Solutions
Employee

Re: Removing Special Characters

Ok. In that case, could you please go to the advanced settings of the tfileinput component and convert the language settings to UTF-8.

 

The symbols are getting generated due to wrong language set. You can remove the tMap in this case.

 

Warm Regards,
Nikhil Thampi

Please appreciate our Talend community members by giving Kudos for sharing their time for your query. If your query is answered, please mark the topic as resolved :-)


Warm Regards,
Nikhil Thampi
Please appreciate our members by giving Kudos for spending their time for your query. If your query is answered, please mark the topic as resolved :-)

All Replies
Highlighted
Employee

Re: Removing Special Characters

Hi,

 

    You can remove all non-Latin characters by using below function in a tmap.

 

row2.input_data.replaceAll("[^\\x00-\\x7F]", "")

image.png

 

 

Warm Regards,
Nikhil Thampi

Please appreciate our Talend community members by giving Kudos for sharing their time for your query. If your query is answered, please mark the topic as resolved :-)


Warm Regards,
Nikhil Thampi
Please appreciate our members by giving Kudos for spending their time for your query. If your query is answered, please mark the topic as resolved :-)
Six Stars

Re: Removing Special Characters

Hi Nikhil,

 

But if that's the case, my chinese character words (such as 小姐) will also be an empty string right?

I just want to remove those garbled letters such as ...  '' 

 

thanks.

 

 

Employee

Re: Removing Special Characters

Ok. In that case, could you please go to the advanced settings of the tfileinput component and convert the language settings to UTF-8.

 

The symbols are getting generated due to wrong language set. You can remove the tMap in this case.

 

Warm Regards,
Nikhil Thampi

Please appreciate our Talend community members by giving Kudos for sharing their time for your query. If your query is answered, please mark the topic as resolved :-)


Warm Regards,
Nikhil Thampi
Please appreciate our members by giving Kudos for spending their time for your query. If your query is answered, please mark the topic as resolved :-)

What’s New for Talend Spring ’19

Join us live for a sneak peek!

Sign up now

Definitive Guide to Data Quality

Create systems and workflow to manage clean data ingestion and data transformation.

Download

Tutorial

Introduction to Talend Open Studio for Data Integration.

Watch

Downloads and Trials

Test drive Talend's enterprise products.

Downloads