Issue tExtractJSONFields Encoding - Special Characters

Four Stars

Issue tExtractJSONFields Encoding - Special Characters

Hi,

 

I've been having a problem in my job where it looks like the tExtractJSONFields component is doing some sort of encoding on my json message. It is affecting some of the special characters in my message, which is causing an issue in the final file I output.

 

For example:

http://example.com/test when extracted becomes

http:\/\/example.com\/test

or

USA/UK/Europe/Australia/New Zealand

USA\/UK\/Europe\/Australia\/New Zealand

or

Example With – Dash

Example With \u2013 Dash

 

My job flow is like follows:

Job Flow.pngJob Flow

I call a rest client that returns a JSON response (encoded in UTF-8) which I then extract with tExtractJSONFields (setup as follows):

tExtractJSONFields.pngtExtractJSONFields

Looking up the documentation for tExtractJSONFields there is supposed to be an advanced setting to set the encoding however mine is missing this option (Talend ver 6.3.1) not sure why or if this would fix the issue.

Advanced.png

 

My understanding is that this component converts the entire body of the response to a single string, I'm not sure why it is trying to change the encoding of the response. I've got the tFileOutputDelimited set to UTF-8 and it doesn't seem to encode the string correctly either. All of the changes made by tExtractJSON fields remain in the output file.

 

I would really appreciate any help, I'm happy to give more info if I've missed something useful!

Highlighted
Thirteen Stars

Re: Issue tExtractJSONFields Encoding - Special Characters

Hi!

 

it not always clean from the documentation, but Encoding will be available in Advanced Settings if choose XPath instead of JSONPath :-)

 

both work for JSON well, so you can test it

 

 

-----------
Four Stars

Re: Issue tExtractJSONFields Encoding - Special Characters

Thanks Vapukov! That was really helpful, I can see the encoding now and am switching over to Xpath. I've tried it initially and it looks like although it fixed the majority of the introduced backslashes and even some of the formatting is better there are still some issues. Where there have been XML/HTML tags there is still a backslash being introduced.

e.g. <BR>xxx</BR> becomes <BR>xxx<\/BR> and something new that was introduced was my integers are being replaced by strings e.g.

"test": 1000 beomes "test": "1000" and finally my empty arrays are disappearing from the extraction.

 

I'm going to be playing around with it more though and see if its an issue with my XPath query. But if you recognise the problems any help would be great!

What’s New for Talend Spring ’19

Join us live for a sneak peek!

Sign up now

Definitive Guide to Data Quality

Create systems and workflow to manage clean data ingestion and data transformation.

Download

Tutorial

Introduction to Talend Open Studio for Data Integration.

Watch

Downloads and Trials

Test drive Talend's enterprise products.

Downloads