I have been given a .txt file which contains lines looking like that :
Here for example, we have 4 lines with 5 columns each. And in my 4th columns, some characters are not recognize as UTF-8 characters.
What I would like to do is either 1/erase those wrong characters, 2/ replace them by a space or 3/ recover them in order to read them correctly.
I tried to use a regex in a tMap component in order to erase or replace the wrong characters.
But it didn't work out ! My wrong characters still stay the same...
I also tried using NotePad++ to convert my file from UTF-8 back to ANSII but it is not possible. The characters don't revert back to how they should. So using a routine to change the encoding of my file is not really an option too.
I am starting to run out of ideas and options. Anyone has a good idea to share ?
ps : i join my test file if anyone want to run some tests
Solved! Go to Solution.
Try WINDOWS-1252 / CP-1252
Is it data directly from a database, ask its owner/sender which collation is used for the table settings.
thank you for your quick anwser, but I tried that and it didn't work out.
Even if I knew which was the native encoding, I think reverting back the file to that encoding would still be impossible.
Is there any other way to capture those characters to erase them ? I think it might be simplier. I tried a regex with alpha-numeric characters allowed only ([^a-zA-Z0-9]) but I couldn't capture/change/erase the wrong characters. Did I missed something here ?
Ok, now I get it. Even though I would have prefer a quicker solution, I will try it that way to reach a durable solution.
Thank you for your input and all the explanations, @Dijke !
Talend named a Leader.
Kickstart your first data integration and ETL projects.
This video will help someone new to using Talend Studio get started by connecting to Talend Cloud and fetching the Studio License
The Talend Cloud Developer Series was created to give you a solid foundational understanding of Talend’s Cloud Integration Platform
An integration platform-as-a-serviceto help enterprises collect, govern, transform, and share data from any data sources