Six Stars

Normalizing multiple input columns at once

Below is the structure of a json file which we have at the source, we can extract each columns from the json file. The problem we have is that each column have a comma separated values (each value has to converted into a separate row).
In the below example; the 1st value in the column name "Title" in Level 1 should match to the 1st value in the Column name "revisists" in Level 1. Basically, the nth value of a column name (of the same level) should match to the nth value of the other column name in the same level. I am providing the input and output record below. Can anyone help me to get this issue resolved.

Please note, there can be more than 2 values (comma separated values) in each level; and the upper level should have the same value populated for both of the records.
i.e If there is m records in Level 0 and n records in Level 1 and o records in Level 2 then there has to be mxnxo records at the output where m records have the same values for other values of n etc.,

 

count   ---Level 0
_kmd
_kmd.ect
_kmd.lmt
programType
lessons
 title     --Level 1
 revisits
 id
 comprehension
 textAvailable
 files
  _acl --Level 2
   creator --Level 3
  _public
  filename
  _kmd
   ect -- Level 3
   lmt
  mimetype --Level 2
  _id
  size
 completion --Level 1
 audioAvailable
 surveyUrl
 time
 videoAvailable
_acl --Level 0
 creator
title
_id --Level 0

 

Input File : 
[{"_id":"58e3e47063fe11cb5aa36cea","title":"SAMHSA IOP Lessons 1 - 3","lessons":[{"title":"Lesson 1: Alcohol","surveyUrl":"","files":[{"size":485230,"mimeType":"application/pdf","mimetype":"application/pdf","_filename":"1_Alcohol_RP1.pdf","_public":true,"_id":"696a8cb4-45a7-438c-8648-20600995eb96","_acl":{"creator":"58ddb162db2ef12435d8cb85"},"_kmd":{"lmt":"2017-04-04T21:16:50.881Z","ect":"2017-04-04T21:16:50.881Z"},"_data":{}},{"size":47069296,"mimeType":"video/mp4","mimetype":"video/mp4","_filename":"Alcohol.mp4","_public":true,"_id":"de8819aa-6eea-4037-97e9-0687af3061b9","_acl":{"creator":"58ddb162db2ef12435d8cb85"},"_kmd":{"lmt":"2017-04-04T21:17:00.480Z","ect":"2017-04-04T21:17:00.480Z"},"_data":{}},{"size":5693373,"mimeType":"audio/mp3","mimetype":"audio/mp3","_filename":"Alcohol_1.mp3","_public":true,"_id":"cd49904f-3c07-47d1-a69a-105f982ca0ec","_acl":{"creator":"58ddb162db2ef12435d8cb85"},"_kmd":{"lmt":"2017-04-04T21:22:55.780Z","ect":"2017-04-04T21:22:55.780Z"},"_data":{}}],"textAvailable":true,"id":1491330153,"completion":0,"time":"","revisits":0,"comprehension":0,"videoAvailable":true,"audioAvailable":true},{"title":"Lesson 2: Boredom","files":[{"size":29475149,"mimeType":"video/mp4","mimetype":"video/mp4","_filename":"Boredom.mp4","_public":true,"_id":"49acb6df-e74d-4c86-b0a9-fbd7d08b20b5","_acl":{"creator":"58ddb162db2ef12435d8cb85"},"_kmd":{"lmt":"2017-04-04T21:26:56.570Z","ect":"2017-04-04T21:26:56.570Z"},"_data":{}},{"size":4245573,"mimeType":"audio/mp3","mimetype":"audio/mp3","_filename":"Boredom.mp3","_public":true,"_id":"5afc0339-81df-4d0a-ab15-ea0b25ee011c","_acl":{"creator":"58ddb162db2ef12435d8cb85"},"_kmd":{"lmt":"2017-04-04T21:28:44.714Z","ect":"2017-04-04T21:28:44.714Z"},"_data":{}},{"size":358300,"mimeType":"application/pdf","mimetype":"application/pdf","_filename":"1_Boredom_RP2.pdf","_public":true,"_id":"8e6dbeb5-ac50-4392-ac3a-31ac96f69564","_acl":{"creator":"58ddb162db2ef12435d8cb85"},"_kmd":{"lmt":"2017-04-04T21:29:07.291Z","ect":"2017-04-04T21:29:07.291Z"},"_data":{}}],"videoAvailable":true,"audioAvailable":true,"textAvailable":true,"id":1491341420,"completion":0,"time":"","revisits":0,"comprehension":0},{"title":"Lesson 3: Avoiding Relapse Drift","surveyUrl":"","files":[{"size":687737,"mimeType":"application/pdf","mimetype":"application/pdf","_filename":"1_Avoiding Relapse Drift_RP3A.pdf","_public":true,"_id":"7b27d3f7-3166-42fc-88b6-76cd9c366b0a","_acl":{"creator":"58ddb162db2ef12435d8cb85"},"_kmd":{"lmt":"2017-04-04T21:30:12.156Z","ect":"2017-04-04T21:30:12.156Z"},"_data":{}},{"size":26082267,"mimeType":"video/mp4","mimetype":"video/mp4","_filename":"Avoiding Relapse Drift v1.mp4","_public":true,"_id":"f0326221-a834-44c9-b78b-94061a7de2b5","_acl":{"creator":"58ddb162db2ef12435d8cb85"},"_kmd":{"lmt":"2017-04-04T21:31:00.683Z","ect":"2017-04-04T21:31:00.683Z"},"_data":{}},{"size":3141799,"mimeType":"audio/mp3","mimetype":"audio/mp3","_filename":"Avoiding Relapse Drift.mp3","_public":true,"_id":"e3383266-a6d3-416f-8473-31931ce4ea1a","_acl":{"creator":"58ddb162db2ef12435d8cb85"},"_kmd":{"lmt":"2017-04-04T21:32:06.361Z","ect":"2017-04-04T21:32:06.361Z"},"_data":{}}],"textAvailable":true,"videoAvailable":true,"audioAvailable":true,"id":1491341677,"completion":0,"time":"","revisits":0,"comprehension":0},{"title":"Lesson 1-3 Worksheet","surveyUrl":"https://www.surveygizmo.com/s3/3479631/Crossroads-IOP-Lesson-1-3-copy","files":[],"id":1491414764,"completion":0,"time":"","revisits":0,"comprehension":0}],"programType":"CORE East IOP","_acl":{"creator":"58ddb162db2ef12435d8cb85"},"_kmd":{"lmt":"2017-04-05T18:01:06.662Z","ect":"2017-04-04T18:22:40.275Z"},"count":0}]

 

  • Big Data
  • Data Integration
4 REPLIES
Community Manager

Re: Normalizing multiple input columns at once

Hi
Take a look at tNormalize component, I think you can use this component more times for each level if there exist multiple values. Let me know if it fit your need or you have troubles to use this component.

Regards
Shong

----------------------------------------------------------
Talend | Data Agility for Modern Business
Six Stars

Re: Normalizing multiple input columns at once

No, tNormalize does not fit my requirement as it can only normalize only one column at any time. I am guessing this require a java code to do the same. Kindly help me with the same.

Community Manager

Re: Normalizing multiple input columns at once

Yes, it can only normalize only one column at a time, however, you can use it multiple times for each column that might have several values separated by comma.

----------------------------------------------------------
Talend | Data Agility for Modern Business
Six Stars

Re: Normalizing multiple input columns at once

​I am thinking of using the concept mentioned in the below URL. Can you
please help me to know the actual expression on how to count a no. of
comma's in a particular column of value ["a","b.x","c\/a"] and also how to
extract the value (to use it in a variable) as "a", "b.x" and "c/a"
respectively​
URL:
https://stackoverflow.com/questions/26365036/talend-generating-n-multiple-rows-from-1-row