Issue tExtractJSONFields Encoding - Special Characters

Four Stars

Issue tExtractJSONFields Encoding - Special Characters



I've been having a problem in my job where it looks like the tExtractJSONFields component is doing some sort of encoding on my json message. It is affecting some of the special characters in my message, which is causing an issue in the final file I output.


For example: when extracted becomes



USA/UK/Europe/Australia/New Zealand

USA\/UK\/Europe\/Australia\/New Zealand


Example With – Dash

Example With \u2013 Dash


My job flow is like follows:

Job Flow.pngJob Flow

I call a rest client that returns a JSON response (encoded in UTF-8) which I then extract with tExtractJSONFields (setup as follows):


Looking up the documentation for tExtractJSONFields there is supposed to be an advanced setting to set the encoding however mine is missing this option (Talend ver 6.3.1) not sure why or if this would fix the issue.



My understanding is that this component converts the entire body of the response to a single string, I'm not sure why it is trying to change the encoding of the response. I've got the tFileOutputDelimited set to UTF-8 and it doesn't seem to encode the string correctly either. All of the changes made by tExtractJSON fields remain in the output file.


I would really appreciate any help, I'm happy to give more info if I've missed something useful!

Forteen Stars

Re: Issue tExtractJSONFields Encoding - Special Characters



it not always clean from the documentation, but Encoding will be available in Advanced Settings if choose XPath instead of JSONPath :-)


both work for JSON well, so you can test it



Four Stars

Re: Issue tExtractJSONFields Encoding - Special Characters

Thanks Vapukov! That was really helpful, I can see the encoding now and am switching over to Xpath. I've tried it initially and it looks like although it fixed the majority of the introduced backslashes and even some of the formatting is better there are still some issues. Where there have been XML/HTML tags there is still a backslash being introduced.

e.g. <BR>xxx</BR> becomes <BR>xxx<\/BR> and something new that was introduced was my integers are being replaced by strings e.g.

"test": 1000 beomes "test": "1000" and finally my empty arrays are disappearing from the extraction.


I'm going to be playing around with it more though and see if its an issue with my XPath query. But if you recognise the problems any help would be great!


Talend named a Leader.

Get your copy


Kickstart your first data integration and ETL projects.

Download now

What’s New for Talend Summer ’19

Watch the recorded webinar!

Watch Now

Best Practices for Using Context Variables with Talend – Part 1

Learn how to do cool things with Context Variables


Migrate Data from one Database to another with one Job using the Dynamic Schema

Find out how to migrate from one database to another using the Dynamic schema


Best Practices for Using Context Variables with Talend – Part 4

Pick up some tips and tricks with Context Variables