One Star

!!! Fighting hard with control characters !!!

Hi!
I'm running Talend 5.0.1 and I'm fighting hard to get rid of, or replace, control characters... This is what I have:
1. tMysqlInput reading from a MySQL database with an utf8_general_ci encoding where some of the characters appear as an "em symbol" in the query output
2. tReplace where I'm trying to replace "\u0025" with an emty string ""
3. tMap component
4. tAdvancedFileOutput where the output encoding is set to UTF8
I thought I'd remove the problem by enclose the text with "<!]>" but it didn't helpSmiley Sad - Furthermore the tReplace component seems to be unable to replace the single "em" character by looking for "\u0025". If I don't enclose the text with the CDATA directive I get written to the file which causes problems when I try to index the XML in another system...
Hope you're able to help me here because I'm 100% stuck with this...
Many thanks!
13 REPLIES
One Star

Re: !!! Fighting hard with control characters !!!

Hi
Now let's simplify this issue.
Show me details, such as Input Data, Expected Data.
Regards,
Pedro
One Star

Re: !!! Fighting hard with control characters !!!

Hi Pedro!
Smiley Happy
This is what it looks like:
Part of the data from MySQL: "... continued its mission..." - I believe the strange character getS encoded to "" when using a tAdvancedFileOutput component. I'd like to replace it with "'" or get rid of it altogether...
There are a number of similar characters but I thought that if I get rid of the one above first then I could use the same approach for the restSmiley Happy
Cheers!
One Star

Re: !!! Fighting hard with control characters !!!

Hi
Set tRaplace as the following image.
Don't check "Whole word".
Or I misunderstood what you mean?
Regards,
Pedro
One Star

Re: !!! Fighting hard with control characters !!!

Hi Pedro!
Thnaks for your suggestion. I do have a question though: the character I wanted to remove/replace was not the "..." but the single character that looked "strange" in my posting. The "..." was included to show that there were leading and trailing textSmiley Happy
Cheers!
One Star

Re: !!! Fighting hard with control characters !!!

Hi
Got it this time.
Do as the image above.
I' m not sure whether tReplace can replace special character.
But you can try it.
The only thing is that don't check "Whole word".
Regards,
Pedro
One Star

Re: !!! Fighting hard with control characters !!!

Uh uh - this sounds a bit troublesome. Do you think I'm able to do something like this instead:
row2.Summary.replace("\u0025", "")
otherwise specify "\u0025" while using tReplace?
Cheers!
One Star

Re: !!! Fighting hard with control characters !!!

Hi
This issue is the same with Euro Symbol.
I tried to replace ? with "".
But unfortunately it doesn't work. I reported on BugTracker several days before.
You may try it with Java method.
Regards,
Pedro
One Star

Re: !!! Fighting hard with control characters !!!

Hi Pedro!
Many thanks for your help. I hope this get solved soon. In the meantime I need to find another solution...
Cheers
One Star

Re: !!! Fighting hard with control characters !!!

Is the data actually UTF8? If so it would display properly.
One Star

Re: !!! Fighting hard with control characters !!!

Hmmmm, the characters I'm trying to remove are \u0019, \u0025, \u0028 and \u0029 and they're shown as "strange single character" charactersSmiley Happy
I checked in the original file and it is UTF8-encoded...
Cheers
One Star

Re: !!! Fighting hard with control characters !!!

Hi!
I managed to fix a workaround. This is what I did:
Using a tMap component, I invoked the 'replaceAll' method on the column causing the problem: <column>.replaceAll("","")
I hope that it will become possible to use a similar approach using the tReplace component in the future.
Cheers!
One Star

Re: !!! Fighting hard with control characters !!!

Hi,
Couldn't you use the tReplace component with some regular expression that allows standard characters only? I'm not that good with regex, but somthing like allows only alphanumeric characters including spaces.
Hope this helps.
Regards,
Arno
One Star

Re: !!! Fighting hard with control characters !!!

@avdbrink:
hmmm, not sure - I tried to replace something like "\u000c" for example but never got it to work with tReplace... it could be that I provided the parameters wrongly but...
The "workaround" I applied works fine soo I'll stick with that for the time being. Thanks for your suggestion thoughSmiley Happy
Cheers