Best practices for splitting complex json

Highlighted
Five Stars

Best practices for splitting complex json

Hi guys,

i'm pretty new in talend and imported my first JSON. The problem is that this file is very complex and I don't know exactly how to read it right.

Let me give you an example:

  "data": [
    {
      "EANCode": "1234567",
      "Info": {
        "active": true,
        "articleID": 1234,
        "articleKey": "1234",
      },
      "attributes": [
        {
          "Key": "date",
          "Value": "2011-11-11"
        },
        {
          "Key": "color",
          "Value": "red"
        },
        {
          "Key": "Key@Aufmachung",
          "Value": "A12    @01"
        },
        {
          "Key": "Key@TkgNr",
          "Value": "A1@123"
        }
      ],
      "b2csku": "12345678

It's a huge file, around 22k lines, ~500 per item. My biggest problem is the value in "Key" and "Value" and the fact that they always contain different amounts of information which is why I can't 'hardcode' it. My current approach is a simple string manipulation to extract the key value values. My idea was to cut off the first characters in tmap, but I don't think the approach is very successful either. It also doesn't help me with the variable amount of data in "attributes". Do you have any tips or ideas for me?

 

Best regards,

BooWseR

Tags (1)
Community Manager

Re: Best practices for splitting complex json

Five Stars

Re: Best practices for splitting complex json

Thanks for your answer. I have now iterated the JSON in several runs.

iterate.PNGThis works quite well so far, but now I have problems merging the data into columns. For example, I use tMap to write the values into my own columns:

(data.attributes.Key.equals("date")) ? data.attributes.Value : null --> date

 

Now my output looks like:

EANdatecolorKey@AufmachungKey@TkgNr
12345672011-11-11nullnullnull
1234567nullrednullnull
1234567nullnullA12 @01null
1234567nullnullnullA1@123

 

I would now like to merge the rows, but I have no idea how. I tried tDenormalize and select all columns except EAN, but it doesnt effect my output. I also tried tmemorizeRows, but even without success. What am I doing wrong?

Community Manager

Re: Best practices for splitting complex json

A simple way to do this (if you know that there is one row per EAN) is to use the tAggregateRow component. Group by the EAN column and output the other columns using the "first" function and tick the "Ignore null values" tickbox.

What’s New for Talend Spring ’19

Watch the recorded webinar!

Watch Now

Best Practices for Using Context Variables with Talend – Part 4

Pick up some tips and tricks with Context Variables

Blog

How Media Organizations Achieved Success with Data Integration

Learn how media organizations have achieved success with Data Integration

Read

6 Ways to Start Utilizing Machine Learning with Amazon We Services and Talend

Look at6 ways to start utilizing Machine Learning with Amazon We Services and Talend

Blog