Nine Stars

[resolved] What is the best way to remove special characters from an entire file?

I have files that are several GB in size that have weird special characters which cause parsing the files to be problematic.
Is there any easy way to remove all non-letter or non-number characters from the entire file?
1 ACCEPTED SOLUTION

Accepted Solutions
Nine Stars

Re: [resolved] What is the best way to remove special characters from an entire file?

Thanks archenroot and shong!
This worked:
row1.myRow.replaceAll("", "");   
7 REPLIES
Community Manager

Re: [resolved] What is the best way to remove special characters from an entire file?

Using tFileInputFullRow to read each row one by one, remove or replace the special characters in each row and output it to a new file.
----------------------------------------------------------
Talend | Data Agility for Modern Business
One Star

Re: [resolved] What is the best way to remove special characters from an entire file?

You can also try to re-encode the file
Six Stars

Re: [resolved] What is the best way to remove special characters from an entire file?

Just extension to solution with tFileInputFullRow...
Connect it to the tJavaRow where you will put:
row1.myRow.replaceAll("", "");

And connect it to the output file. This will leave in file only alphanumeric characters.
Ladislav
One Star

Re: [resolved] What is the best way to remove special characters from an entire file?

Hello 
Could this be used to get rid of characters like ® and ™?
Thanks
Six Stars

Re: [resolved] What is the best way to remove special characters from an entire file?

That should remove anything what is NOT a-z or A-Z or 0-9
Nine Stars

Re: [resolved] What is the best way to remove special characters from an entire file?

Thanks archenroot and shong!
This worked:
row1.myRow.replaceAll("", "");   
One Star

Re: [resolved] What is the best way to remove special characters from an entire file?

I also found this, works a treat and retains the blank space.
row1.myRow.replaceAll("","");
Thanks