Issue with tRESTClient with "$" (dollar sign) in key of JSON structure

Six Stars

Issue with tRESTClient with "$" (dollar sign) in key of JSON structure

Hi All,

I am pulling data from a REST api and uploading the same in DataBase. For this I have used 

tRESTClient-->tExtractJSONField-->tOutputOracle

 

This works fine, until the key part of (key:value) has $ 

 

Example JSON is for Chats. The chat might have attachments sometimes and there are extra elements to 'history' and JSON looks like

 

{
"chats":[
{

"id":"2001"
"comment": null,
"triggered_response": true,
"history": [
{
"name": "Guest1",
"msg_id": "1534494221340",
"msg": "Test message from Guest",
},
{
"name": "Support",
"msg_id": "1534494221340",
"msg": "Test message from support"
}

"attachment": {
"mime_type$string": "image/jpeg",
"name$string": "image.jpg",
"url$string": "image_url"
}

]
}
"count": 179049,
"next_url": "www.nexturl..."
}

 

This causes the tRestClient to break and gives following error

 

Exception in component tRESTClient_5 (Rest_API)
javax.ws.rs.client.ResponseProcessingException: Problem with reading the data, class org.dom4j.Document, ContentType: application/json;charset=utf-8.
[ERROR]: org.apache.cxf.jaxrs.utils.JAXRSUtils - Problem with reading the data, class org.dom4j.Document, ContentType: application/json;charset=utf-8.
at org.apache.cxf.jaxrs.impl.ResponseImpl.reportMessageHandlerProblem(ResponseImpl.java:446)
at org.apache.cxf.jaxrs.impl.ResponseImpl.doReadEntity(ResponseImpl.java:386)
at org.apache.cxf.jaxrs.client.AbstractClient.readBody(AbstractClient.java:528)
at org.apache.cxf.jaxrs.client.WebClient.handleResponse(WebClient.java:1126)
at org.apache.cxf.jaxrs.client.WebClient.doResponse(WebClient.java:1114)
at org.apache.cxf.jaxrs.client.WebClient.doChainedInvocation(WebClient.java:1039)
at org.apache.cxf.jaxrs.client.WebClient.doInvoke(WebClient.java:894)
at org.apache.cxf.jaxrs.client.WebClient.doInvoke(WebClient.java:862)
at org.apache.cxf.jaxrs.client.WebClient.invoke(WebClient.java:427)
at org.apache.cxf.jaxrs.client.WebClient.get(WebClient.java:607)
at dataland_data_import.rest_api_0_1.Rest_API.tRESTClient_5Process(Rest_API.java:856)
at dataland_data_import.rest_api_0_1.Rest_API.runJobInTOS(Rest_API.java:2435)
at dataland_data_import.rest_api_0_1.Rest_API.main(Rest_API.java:2284)
Caused by: javax.ws.rs.BadRequestException: HTTP 400 Bad Request
at org.apache.cxf.jaxrs.utils.SpecExceptions.toBadRequestException(SpecExceptions.java:84)
at org.apache.cxf.jaxrs.utils.ExceptionUtils.toBadRequestException(ExceptionUtils.java:121)
at org.apache.cxf.jaxrs.provider.json.JSONProvider.readFrom(JSONProvider.java:268)
at org.apache.cxf.jaxrs.provider.dom4j.DOM4JProvider.readFrom(DOM4JProvider.java:79)
at dataland_data_import.rest_api_0_1.Rest_API$2.readFrom(Rest_API.java:682)
at org.apache.cxf.jaxrs.provider.dom4j.DOM4JProvider.readFrom(DOM4JProvider.java:47)
at org.apache.cxf.jaxrs.utils.JAXRSUtils.readFromMessageBodyReader(JAXRSUtils.java:1379)
at org.apache.cxf.jaxrs.impl.ResponseImpl.doReadEntity(ResponseImpl.java:377)
... 11 more
Caused by: org.w3c.dom.DOMException: INVALID_CHARACTER_ERR: An invalid or illegal XML character is specified.
at com.sun.org.apache.xerces.internal.dom.CoreDocumentImpl.checkQName(CoreDocumentImpl.java:2603)
at com.sun.org.apache.xerces.internal.dom.ElementNSImpl.setName(ElementNSImpl.java:121)
at com.sun.org.apache.xerces.internal.dom.ElementNSImpl.<init>(ElementNSImpl.java:84)
at com.sun.org.apache.xerces.internal.dom.CoreDocumentImpl.createElementNS(CoreDocumentImpl.java:2121)
at org.apache.cxf.staxutils.W3CDOMStreamWriter.createElementNS(W3CDOMStreamWriter.java:155)
at org.apache.cxf.staxutils.W3CDOMStreamWriter.createAndAddElement(W3CDOMStreamWriter.java:161)
at org.apache.cxf.staxutils.W3CDOMStreamWriter.writeStartElement(W3CDOMStreamWriter.java:108)
at org.apache.cxf.staxutils.StaxUtils.writeStartElement(StaxUtils.java:832)
at org.apache.cxf.staxutils.StaxUtils.copy(StaxUtils.java:760)
at org.apache.cxf.staxutils.StaxUtils.copy(StaxUtils.java:724)
at org.apache.cxf.jaxrs.provider.json.JSONProvider.copyReaderToWriter(JSONProvider.java:411)
at org.apache.cxf.jaxrs.provider.json.JSONProvider.readFrom(JSONProvider.java:227)
... 16 more

 

any suggestions how to handle this :-(


Accepted Solutions
Community Manager

Re: Issue with tRESTClient with "$" (dollar sign) in key of JSON structure

OK, I have figured out what is going wrong and unfortunately there isn't really a *nice* fix to it. You need to parse this like XML (using XPath) because JsonPath irritatingly does not support the parent operator (which I agree is annoying). The problem you have is that XML does not allow element names with $ characters in them. Which kind of leaves you in limbo. What you can do is put a tJavaFlex or tJavaRow between your tRestClient and your tExtractJson component and replace any $ characters with another character (maybe a "_"). I did this to try it out and it worked perfectly.

 

I agree it is irritating, but this isn't really a Talend bug. It is more of a flaw of the flexibility (ie allowing your to parse Json as XML). 

 

The code I added can be seen below....

 

row1.string = string.replace('$', '_');

I hope this helps.


All Replies
Community Manager

Re: Issue with tRESTClient with "$" (dollar sign) in key of JSON structure

Do you have the "Convert Response to DOM Document" option ticked in the Advanced Settings tab of the tRestClient? If so, try unticking it and running.

Six Stars

Re: Issue with tRESTClient with "$" (dollar sign) in key of JSON structure

Thanks rhall_2_0, that did resolved the error on the tRESTClient. However now I am getting error in tExtractJSONField

ReadBy: Xpah
JSON Field : string
Loop XPath Query : "/chats/history[*]"

Error on line 2 of document : Element type "mime_type" must be followed by either attribute specifications, ">" or "/>". Nested exception: Element type "mime_type" must be followed by either attribute specifications, ">" or "/>".
Seven Stars

Re: Issue with tRESTClient with "$" (dollar sign) in key of JSON structure

I don't think /chats/history[*] is correct. For a start, chats is an array. You probably need to expand on the xpath "/chats" and assign the "history" tag to a column, then expand that on "/" and pull out the "name", "msg_id", and "msg" elements.

 

You might be able to iterate directly using something like "/chats[*]/history", but I think I might be mixing XPath and JsonPath terminology there!

Community Manager

Re: Issue with tRESTClient with "$" (dollar sign) in key of JSON structure

Read the file using JsonPath. If you're working with JSON it makes sense to use JSON tools. Your loop should be something like ....

 

"$.chats.history[*]"

Six Stars

Re: Issue with tRESTClient with "$" (dollar sign) in key of JSON structure

HiPhilHibbs,

looping on element is not a big issues here for now, the main issue that I am facing is tExtractJSONElements not being able to parse/extract information from tRestClient because there is $ in the one of the key of key:value pair.

 

"attachment": {
"mime_type$string": "image/jpeg",
"name$string": "image.jpg",
"url$string": "image_url"
}

 

A side note Loop XPath Query : "/chats/history[*]" did work as expected and I got expected result 

 

ChatID   History_Name   History_Msg

2001      Guest1              Test message from Guest

2001      Support              Test message from support

 and so on..

 

Regarding the reply from rhall_2_0

I did try with reading the file with JSON Path and used the Loop JSON Query, but the issue here is I get the values of looping element but not the parent element 

 

Something like this 

ChatID   History_Name   History_Msg

NULL     Guest1              Test message from Guest

NULL     Support              Test message from support

 

Community Manager

Re: Issue with tRESTClient with "$" (dollar sign) in key of JSON structure

I just tried to quickly try something out and it showed that the JSON you've given us is poorly formatted. Should it be more like this?

 

{  
   "chats":[  
      {  
         "id":"2001",
         "comment":null,
         "triggered_response":true,
         "history":[  
            {  
               "name":"Guest1",
               "msg_id":"1534494221340",
               "msg":"Test message from Guest"
            },
            {  
               "name":"Support",
               "msg_id":"1534494221340",
               "msg":"Test message from support"
            }
         ],
         "attachment":[  
            {  
               "mime_type$string":"image/jpeg",
               "name$string":"image.jpg",
               "url$string":"image_url"
            }
         ]
      }
   ],
   "count":179049,
   "next_url":"www.nexturl..."
}
Seven Stars

Re: Issue with tRESTClient with "$" (dollar sign) in key of JSON structure

Ah. I had that exact problem with a key that had a "." in it, I couldn't parse it using JsonPath so I had to switch to XPath. Maybe you are having the opposite problem, that XPath can't parse it because of the "$". Try switching to JsonPath!

 

And hope that you don't get a tag that has both a "$" and a "."... Smiley Sad

Six Stars

Re: Issue with tRESTClient with "$" (dollar sign) in key of JSON structure

@rhall_2_0 -Yes the format you mentioned is correct.

 

@PhilHibbs Yes I tried with JSON Path rather than XPath and used the Loop JSON Query, but the issue here is I get the values of looping element but not the parent element 

 

Something like this 

ChatID   History_Name   History_Msg

NULL     Guest1              Test message from Guest

NULL     Support              Test message from support

 

 

 

Seven Stars

Re: Issue with tRESTClient with "$" (dollar sign) in key of JSON structure

How about extracting the JSON in two stages - one to expand the "/chats[*]" and pull out the "id", "attachment", and the "history", then a second stage that expands the "history" with a Loop query of "$" (or "/" for Jsonpath, as there will be no troublesome $s here ) and pulls out the history elements.

 

Community Manager

Re: Issue with tRESTClient with "$" (dollar sign) in key of JSON structure

OK, I have figured out what is going wrong and unfortunately there isn't really a *nice* fix to it. You need to parse this like XML (using XPath) because JsonPath irritatingly does not support the parent operator (which I agree is annoying). The problem you have is that XML does not allow element names with $ characters in them. Which kind of leaves you in limbo. What you can do is put a tJavaFlex or tJavaRow between your tRestClient and your tExtractJson component and replace any $ characters with another character (maybe a "_"). I did this to try it out and it worked perfectly.

 

I agree it is irritating, but this isn't really a Talend bug. It is more of a flaw of the flexibility (ie allowing your to parse Json as XML). 

 

The code I added can be seen below....

 

row1.string = string.replace('$', '_');

I hope this helps.

Six Stars

Re: Issue with tRESTClient with "$" (dollar sign) in key of JSON structure

thanks a lot @rhall_2_0,  I was trying to do something similar yesterday and ended up with 

image.png

 

image.pngimage.pngimage.png

This is still in testing, however there are few points I wanted to share in case it it's of use to anyone 

 

tRestClient

In the very beginning (before $ issue) it was tRestClient ->TExtractJSON->tOracleOut. Somehow I was getting error and this was not working and then when I changed the accept type from "JSON " to "ANY". Then I started getting the $ issue, this where your suggestion "Try Unchecking - Convert Response to DOM document" helped me. However this option does not appear if the "Accept Type = Any". So I ended up using  Accept Type JSON and unchecked the DOM option, these were the final setting and worked for me

 

tJavaRow

got the code from internet, First I was using replaceAll() and struggled for some time and then after some looking around ended up on replace().

 

tJSONExtract

Read By JSON doesn't give required outcome in combination with 'Loop Jsonpath Query'. I did a lot of looking around and most the content I came across ended up suggesting XPath rahter than JSON Path hence ended up using Read By XPath in combination with Loop XPath Query. 

 

Six Stars

Re: Issue with tRESTClient with "$" (dollar sign) in key of JSON structure

Hi @rhall_2_0@PhilHibbs,

The requirement is to get all the chants from API (one time full data dump) and then import chats on daily basis recursively.

 

The next thing I help in figuring out is, API returns data in pages and there are 180K pages.

 

1) How do I extract information from one page and then move to the next one until there is no "next_url". 

2) Once I am done with one time full data dump how do I pull chats and limit it to only current data.

 

Page 1 gives
{
  "chats": [

                  all chat related attributes we discussed earlier

               ],
  "count": 179451,
  "next_url": "next_url_here"

}


Page2 gives

{
  "chats": [

                  all chat related attributes we discussed earlier

               ],
  "count": 179451,
  "prev_url": "previous_url_here"
  "next_url": "next_url_here"

}

 

Page 3 gives ......

 

Documentation on API

 

Searchable fields

 

You can query the following fields:

Field Description Field name

Agent Nameagent_names
Visitor Namevisitor_name
Visitor Email IDvisitor_email
Timestamptimestamp
End Timestampend_timestamp
Chat Typeschat_type; one of offline_msg or chat
Ratingrating; one of good or bad
Free TextNot applicable (default)

A combination of two or more queries is also supported.

 

Example queries

Make sure to url-encode the query string in your code.

Search Query Returns

agent_names:Tom AND visitor_name:HarryReturns chats between agent Tom and visitor Harry
type:chat AND chocolateReturns chats with the term chocolate
visitor_email:john@doe.com AND timestamp:[2014-10-10 TO 2014-12-12]Returns chats with visitor with email ID john@doe.com between 2014-10-10 and 2014-12-12
timestamp:[2014-10-16 TO *] AND chocolateReturns chats that started after 2014-10-16 with chocolate
end_timestamp:[2014-10-25 TO *] AND rating:goodReturns chats that ended after 2014-10-25 with good as the chat rating
Six Stars

Re: Issue with tRESTClient with "$" (dollar sign) in key of JSON structure

Hi @rhall_2_0@PhilHibbs,

 

Need help in Step2. The oroginal requirement is to pull all chat data from API (one time full data dump) and then pull chat on daily basis.The output is spread across 180K pages with each page giving URL to next and previous (except first page which have only 'nex_url' and last page with have only 'prev_url'). How do I modify the job to

1) Pull all data for one time data dump, 180k pages

2) Pull data on daily basis for current day or extract data until the timestamp is current day.

 

Example output from API

 

Page 1 gives
{
    "chats": [

                  all chat related attributes we discussed earlier

                 ],
    "count": 179451,
    "next_url": "next_url_here"

}


Page2 gives

{
    "chats": [

                 all chat related attributes we discussed earlier

                 ],
    "count": 179451
    "prev_url": "previous_url_here"
    "next_url": "next_url_here"

}

Page 3 gives ......

Community Manager

Re: Issue with tRESTClient with "$" (dollar sign) in key of JSON structure

Can you start a new thread for this question please? It is very different to the original one and will likely be quite useful to others in the future. If we answer it here it will just be lost as it doesn't really relate to the subject of this question.

What’s New for Talend Summer ’19

Watch the recorded webinar!

Watch Now

Best Practices for Using Context Variables with Talend – Part 4

Pick up some tips and tricks with Context Variables

Blog

How Media Organizations Achieved Success with Data Integration

Learn how media organizations have achieved success with Data Integration

Read

Definitive Guide to Data Quality

Create systems and workflow to manage clean data ingestion and data transformation.

Download

Tutorial

Introduction to Talend Open Studio for Data Integration.

Watch