Best practices for splitting complex json

Highlighted
Six Stars

Best practices for splitting complex json

Hi guys,

i'm pretty new in talend and imported my first JSON. The problem is that this file is very complex and I don't know exactly how to read it right.

Let me give you an example:

  "data": [
    {
      "EANCode": "1234567",
      "Info": {
        "active": true,
        "articleID": 1234,
        "articleKey": "1234",
      },
      "attributes": [
        {
          "Key": "date",
          "Value": "2011-11-11"
        },
        {
          "Key": "color",
          "Value": "red"
        },
        {
          "Key": "Key@Aufmachung",
          "Value": "A12    @01"
        },
        {
          "Key": "Key@TkgNr",
          "Value": "A1@123"
        }
      ],
      "b2csku": "12345678

It's a huge file, around 22k lines, ~500 per item. My biggest problem is the value in "Key" and "Value" and the fact that they always contain different amounts of information which is why I can't 'hardcode' it. My current approach is a simple string manipulation to extract the key value values. My idea was to cut off the first characters in tmap, but I don't think the approach is very successful either. It also doesn't help me with the variable amount of data in "attributes". Do you have any tips or ideas for me?

 

Best regards,

BooWseR

Tags (1)
Highlighted
Community Manager

Re: Best practices for splitting complex json

Highlighted
Six Stars

Re: Best practices for splitting complex json

Thanks for your answer. I have now iterated the JSON in several runs.

iterate.PNGThis works quite well so far, but now I have problems merging the data into columns. For example, I use tMap to write the values into my own columns:

(data.attributes.Key.equals("date")) ? data.attributes.Value : null --> date

 

Now my output looks like:

EANdatecolorKey@AufmachungKey@TkgNr
12345672011-11-11nullnullnull
1234567nullrednullnull
1234567nullnullA12 @01null
1234567nullnullnullA1@123

 

I would now like to merge the rows, but I have no idea how. I tried tDenormalize and select all columns except EAN, but it doesnt effect my output. I also tried tmemorizeRows, but even without success. What am I doing wrong?

Highlighted
Community Manager

Re: Best practices for splitting complex json

A simple way to do this (if you know that there is one row per EAN) is to use the tAggregateRow component. Group by the EAN column and output the other columns using the "first" function and tick the "Ignore null values" tickbox.

2019 GARTNER MAGIC QUADRANT FOR DATA INTEGRATION TOOL

Talend named a Leader.

Get your copy

OPEN STUDIO FOR DATA INTEGRATION

Kickstart your first data integration and ETL projects.

Download now

Best Practices for Using Context Variables with Talend – Part 2

Part 2 of a series on Context Variables

Blog

Best Practices for Using Context Variables with Talend – Part 1

Learn how to do cool things with Context Variables

Blog

Migrate Data from one Database to another with one Job using the Dynamic Schema

Find out how to migrate from one database to another using the Dynamic schema

Blog