Hi, I am trying to parse a json file and I am getting following error, Error on line 67114 of document : An invalid XML character (Unicode: 0x3) was found in the element content of the document. Nested exception: An invalid XML character (Unicode: 0x3) was found in the element content of the document. I checked the location in the file and it has value like, left\" I checked XML unicode library and applied various replaceAll functions like, .replaceAll("\r\n", "") .replaceAll("\r", "") .replaceAll("\n", "") .replaceAll("\\0xA0", "") .replaceAll("\\0x3", "") in various combinations. Still didn't help. One search lead to indication that it is an control character which comes due to error while encoding the file, thus used .replaceAll("\\0xA0", ""). Another thing I tried was to transfer data from original file (.txt) to another file (.txt) and kept encoding to UTF-8 on both sides. Used the output as Input, still same error. Tried same by converting in (.json) file. Also replaced words 'xml' and '!DOCTYPE' from file in order to avoid XML conflict, still same issue. Actually tFileInputJSON itself is giving error, so rule out any issues with tMap and other components. Can someone shed some light on this issue??
Hi xdshi, Thanks for solution... It didn't help completely, but helped up to some extent. Before, the above mentioned issue would occur previously when the file would be read by tFileInputJSON... But now it reads the file, parses certain rows until it hits the line with invalid XML characters. Once invalid XML character comes up.. everyting after it is skipped... A PHP colleague of mine provided me this query, strip_tags($js->htmlContent->htmlBody) Can't figure out how to use it in Talend..