[resolved] How can I replace all "Ê"? It is the ASCII 202CA character

Nine Stars

[resolved] How can I replace all "Ê"? It is the ASCII 202CA character

When I export from Excel to .csv I am getting the ASCII 202CA Ê character.
How can I use replaceAll to remove all the "Ê" characters from the string?
This doesn't seem to work, the "Ê" is converted to a "?" by the tMap:
row1.myColumn.replaceAll("","").trim()

Sample data:
abcÊÊabc
Desired output:
abcabc

Accepted Solutions
Employee

Re: [resolved] How can I replace all "Ê"? It is the ASCII 202CA character

The character "latin capital letter E with circumflex" is not ASCII 202, as ASCII does not extend beyond 127. If your file is encoded in ISO-8859-1, then the character would indeed be represented as decimal 202, or 0xCA in the file. But if the file is encoded in UTF-8, then it would be represented as two bytes, 0xc3 0x8a, for example.
Java represents all characters in Unicode UTF-16, so in Java, it will be \u00ca.
So you need to make sure that when you read your file into Java, you use the correct encoding. Only then will the character show up as \u00ca in Java. To get rid of the character, you should then be able to use:
...replaceAll ("\u00ca", "")
Hope this helps.

All Replies
Nine Stars

Re: [resolved] How can I replace all "Ê"? It is the ASCII 202CA character

More info on the character I am trying to remove:
DEC: 202
OCT: 312
HEX: CA
BIN: 11001010
Symbol:Ê
Description:Latin capital letter E with circumflex
Employee

Re: [resolved] How can I replace all "Ê"? It is the ASCII 202CA character

The character "latin capital letter E with circumflex" is not ASCII 202, as ASCII does not extend beyond 127. If your file is encoded in ISO-8859-1, then the character would indeed be represented as decimal 202, or 0xCA in the file. But if the file is encoded in UTF-8, then it would be represented as two bytes, 0xc3 0x8a, for example.
Java represents all characters in Unicode UTF-16, so in Java, it will be \u00ca.
So you need to make sure that when you read your file into Java, you use the correct encoding. Only then will the character show up as \u00ca in Java. To get rid of the character, you should then be able to use:
...replaceAll ("\u00ca", "")
Hope this helps.
Nine Stars

Re: [resolved] How can I replace all "Ê"? It is the ASCII 202CA character

I changed the file encoding to ISO-8859-1 and no longer got the weird characters.

Thanks!