Invalid XML character in json file

Invalid XML character in json file

Hi,
I am trying to parse a json file and I am getting following error,
Error on line 67114 of document  : An invalid XML character (Unicode: 0x3) was found in the element content of the document. Nested exception: An invalid XML character (Unicode: 0x3) was found in the element content of the document.
I checked the location in the file and it has value like, left\"
I checked XML unicode library and applied various replaceAll functions like,
.replaceAll("\r\n", "")
.replaceAll("\r", "")
.replaceAll("\n", "")
.replaceAll("\\0xA0", "")
.replaceAll("\\0x3", "")
in various combinations. Still didn't help.
One search lead to indication that it is an control character which comes due to error while encoding the file, thus used .replaceAll("\\0xA0", "").
Another thing I tried was to transfer data from original file (.txt) to another file (.txt) and kept encoding to UTF-8 on both sides. Used the output as Input, still same error. Tried same by converting in (.json) file.
Also replaced words 'xml' and '!DOCTYPE' from file in order to avoid XML conflict, still same issue.
Actually tFileInputJSON itself is giving error, so rule out any issues with tMap and other components.
Can someone shed some light on this issue??
Moderator

Re: Invalid XML character in json file

Hi,
Could you please try to replace carater ascii 0x8 by empty carater (string)
String.replaceAll("\\x08","")

 to see if it is OK with you?
Best regards
Sabrina
--
Don't forget to give kudos when a reply is helpful and click Accept the solution when you think you're good with it.

Re: Invalid XML character in json file

Hi xdshi,
Thanks for solution... It didn't help completely, but helped up to some extent.
Before, the above mentioned issue would occur previously when the file would be read by tFileInputJSON... But now it reads the file, parses certain rows until it hits the line with invalid XML characters. Once invalid XML character comes up.. everyting after it is skipped...
A PHP colleague of mine provided me this query,
strip_tags($js->htmlContent->htmlBody)
Can't figure out how to use it in Talend..