[resolved] What is the best way to remove special characters from an entire file?

Nine Stars

[resolved] What is the best way to remove special characters from an entire file?

I have files that are several GB in size that have weird special characters which cause parsing the files to be problematic.
Is there any easy way to remove all non-letter or non-number characters from the entire file?

Accepted Solutions
Nine Stars

Re: [resolved] What is the best way to remove special characters from an entire file?

Thanks archenroot and shong!
This worked:
row1.myRow.replaceAll("", "");   

All Replies
Community Manager

Re: [resolved] What is the best way to remove special characters from an entire file?

Using tFileInputFullRow to read each row one by one, remove or replace the special characters in each row and output it to a new file.
----------------------------------------------------------
Talend | Data Agility for Modern Business
Four Stars

Re: [resolved] What is the best way to remove special characters from an entire file?

You can also try to re-encode the file
Six Stars

Re: [resolved] What is the best way to remove special characters from an entire file?

Just extension to solution with tFileInputFullRow...
Connect it to the tJavaRow where you will put:
row1.myRow.replaceAll("", "");

And connect it to the output file. This will leave in file only alphanumeric characters.
Ladislav
One Star

Re: [resolved] What is the best way to remove special characters from an entire file?

Hello 
Could this be used to get rid of characters like ® and ™?
Thanks
Six Stars

Re: [resolved] What is the best way to remove special characters from an entire file?

That should remove anything what is NOT a-z or A-Z or 0-9
Nine Stars

Re: [resolved] What is the best way to remove special characters from an entire file?

Thanks archenroot and shong!
This worked:
row1.myRow.replaceAll("", "");   
Highlighted
One Star

Re: [resolved] What is the best way to remove special characters from an entire file?

I also found this, works a treat and retains the blank space.
row1.myRow.replaceAll("","");
Thanks

2019 GARNER MAGIC QUADRANT FOR DATA INTEGRATION TOOL

Talend named a Leader.

Get your copy

OPEN STUDIO FOR DATA INTEGRATION

Kickstart your first data integration and ETL projects.

Download now

What’s New for Talend Summer ’19

Watch the recorded webinar!

Watch Now

Best Practices for Using Context Variables with Talend – Part 4

Pick up some tips and tricks with Context Variables

Blog

How Media Organizations Achieved Success with Data Integration

Learn how media organizations have achieved success with Data Integration

Read

Agile Data lakes & Analytics

Accelerate your data lake projects with an agile approach

Watch