Hi! I'm running Talend 5.0.1 and I'm fighting hard to get rid of, or replace, control characters... This is what I have: 1. tMysqlInput reading from a MySQL database with an utf8_general_ci encoding where some of the characters appear as an "em symbol" in the query output 2. tReplace where I'm trying to replace "\u0025" with an emty string "" 3. tMap component 4. tAdvancedFileOutput where the output encoding is set to UTF8 I thought I'd remove the problem by enclose the text with "<!]>" but it didn't help - Furthermore the tReplace component seems to be unable to replace the single "em" character by looking for "\u0025". If I don't enclose the text with the CDATA directive I get written to the file which causes problems when I try to index the XML in another system... Hope you're able to help me here because I'm 100% stuck with this... Many thanks!
This is what it looks like: Part of the data from MySQL: "... continued its mission..." - I believe the strange character getS encoded to "" when using a tAdvancedFileOutput component. I'd like to replace it with "'" or get rid of it altogether... There are a number of similar characters but I thought that if I get rid of the one above first then I could use the same approach for the rest Cheers!
Hi Pedro! Thnaks for your suggestion. I do have a question though: the character I wanted to remove/replace was not the "..." but the single character that looked "strange" in my posting. The "..." was included to show that there were leading and trailing text Cheers!
Hi This issue is the same with Euro Symbol. I tried to replace ? with "". But unfortunately it doesn't work. I reported on BugTracker several days before. You may try it with Java method. Regards, Pedro
Hmmmm, the characters I'm trying to remove are \u0019, \u0025, \u0028 and \u0029 and they're shown as "strange single character" characters I checked in the original file and it is UTF8-encoded... Cheers
Hi! I managed to fix a workaround. This is what I did: Using a tMap component, I invoked the 'replaceAll' method on the column causing the problem: <column>.replaceAll("","") I hope that it will become possible to use a similar approach using the tReplace component in the future. Cheers!
Hi, Couldn't you use the tReplace component with some regular expression that allows standard characters only? I'm not that good with regex, but somthing like allows only alphanumeric characters including spaces. Hope this helps. Regards, Arno
@avdbrink: hmmm, not sure - I tried to replace something like "\u000c" for example but never got it to work with tReplace... it could be that I provided the parameters wrongly but... The "workaround" I applied works fine soo I'll stick with that for the time being. Thanks for your suggestion though Cheers