Denormalize/Normalize data rows which have dynamic number of values

Highlighted
Six Stars

Denormalize/Normalize data rows which have dynamic number of values

Hi All

 

I have some data which comes from an API XML that is in a format which needs to be restructured. 

 

<RNo>1,2,3,4,5,6,7,8</RNo>
<Name>High Range,Angel Whisper,Bonroca,Joyrize,Miss Ellmya,Not Really True,Spiritus Sancti,Tycoon Storm</Name>
<avgprice>354.57,19.52,22.13,87.69,112.79,27.74,0,0</avgprice>
<price>1.26,4.2,2.2,1.65,1.26,1.75,8,0</price>

What needs to occur, is the value in the before the first comma needs to be aligned to the value from each column for before the first comma. Then, the values from each row that are in the second column are the values from each row before the second comma.

 

I have been trying the denormalize and normalize functions but not having any luck, does anyone have any ideas?

 

Cheers

 

dv


Accepted Solutions
Community Manager

Re: Denormalize/Normalize data rows which have dynamic number of values

@david_beaty's way will definitely work, but I have had exactly this problem before, so can offer an alternative which might work for you. This way will deal with the scenario where the number of items in each row can be different (assuming that item 1 in all rows are always linked, as are item 2, etc, etc).

 

I have built a quick job to demonstrate this. My job looks like this....

Screenshot 2019-04-29 at 09.59.17.png

I've hardcoded your values in 4 columns in the tFixedFlowInput like this.....

Screenshot 2019-04-29 at 09.59.50.png

I then added a tMap and set it up like below....

Screenshot 2019-04-29 at 10.00.04.png

The important details to note here are that I am setting the lookup to reload at each row and am passing the values from my main row to the globalMap.Each column has a key created with the same name. The reason for this is to pass the column values to the tJavaFlex that has been added as my lookup. This lookup will fire for every row of data sent to the tMap. The code in the tJavaFlex will identify how many rows of data are included in your data set, then split the data into those rows for you and send it back. The tMap will then output the rows.

 

The configuration of the tJavaFlex can be seen below.....

Screenshot 2019-04-29 at 10.00.19.png

 

I'll add the actual code below (with comments) so that you can copy it....

 

Start Code

//Split each of your column values into arrays
String[] RNo = ((String)globalMap.get("RNo")).split(",");  
String[] Name = ((String)globalMap.get("Name")).split(",");  
String[] AvgPrice = ((String)globalMap.get("AvgPrice")).split(","); 
String[] Price = ((String)globalMap.get("Price")).split(",");

//Find the max length of the arrays
int maxCount = Math.max(RNo.length,Math.max(Name.length, Math.max(AvgPrice.length,Price.length)));

//Open a For Loop to iterate over the arrays using the maxCount
for(int i=0; i<maxCount; i++){

 

Main Code

//For each column check that there is a value in the array and if so 
//set the column to be that value. Otherwise set it to null
row2.RNo = RNo.length>i ? RNo[i] : null;
row2.Name = Name.length>i ? Name[i] : null;
row2.AvgPrice = AvgPrice.length>i ? AvgPrice[i] : null;
row2.Price = Price.length>i ? Price[i] : null; 

End Code

// Close the For Loop
}

When I ran this job, I got the following output from the tLogRow....

Screenshot 2019-04-29 at 09.59.36.png


All Replies
Nine Stars

Re: Denormalize/Normalize data rows which have dynamic number of values

Hi,

 

You'd be able to achieve this by tNormalise each field one at a time, into a HashMap or similar, adding in a sequence to them, and then join back in with a tMap.

 

Regards David
Dont forget to give Kudos when an answer is helpful or mark the answer as the solution.
Community Manager

Re: Denormalize/Normalize data rows which have dynamic number of values

@david_beaty's way will definitely work, but I have had exactly this problem before, so can offer an alternative which might work for you. This way will deal with the scenario where the number of items in each row can be different (assuming that item 1 in all rows are always linked, as are item 2, etc, etc).

 

I have built a quick job to demonstrate this. My job looks like this....

Screenshot 2019-04-29 at 09.59.17.png

I've hardcoded your values in 4 columns in the tFixedFlowInput like this.....

Screenshot 2019-04-29 at 09.59.50.png

I then added a tMap and set it up like below....

Screenshot 2019-04-29 at 10.00.04.png

The important details to note here are that I am setting the lookup to reload at each row and am passing the values from my main row to the globalMap.Each column has a key created with the same name. The reason for this is to pass the column values to the tJavaFlex that has been added as my lookup. This lookup will fire for every row of data sent to the tMap. The code in the tJavaFlex will identify how many rows of data are included in your data set, then split the data into those rows for you and send it back. The tMap will then output the rows.

 

The configuration of the tJavaFlex can be seen below.....

Screenshot 2019-04-29 at 10.00.19.png

 

I'll add the actual code below (with comments) so that you can copy it....

 

Start Code

//Split each of your column values into arrays
String[] RNo = ((String)globalMap.get("RNo")).split(",");  
String[] Name = ((String)globalMap.get("Name")).split(",");  
String[] AvgPrice = ((String)globalMap.get("AvgPrice")).split(","); 
String[] Price = ((String)globalMap.get("Price")).split(",");

//Find the max length of the arrays
int maxCount = Math.max(RNo.length,Math.max(Name.length, Math.max(AvgPrice.length,Price.length)));

//Open a For Loop to iterate over the arrays using the maxCount
for(int i=0; i<maxCount; i++){

 

Main Code

//For each column check that there is a value in the array and if so 
//set the column to be that value. Otherwise set it to null
row2.RNo = RNo.length>i ? RNo[i] : null;
row2.Name = Name.length>i ? Name[i] : null;
row2.AvgPrice = AvgPrice.length>i ? AvgPrice[i] : null;
row2.Price = Price.length>i ? Price[i] : null; 

End Code

// Close the For Loop
}

When I ran this job, I got the following output from the tLogRow....

Screenshot 2019-04-29 at 09.59.36.png

Six Stars

Re: Denormalize/Normalize data rows which have dynamic number of values

Thanks all. i was able to replicate the second solution. I need to work on a few tweaks but it seems to be the right approach.

 

Thanks again, i'm sure the reply posted above will help many others.

 

cheers

 

david

2019 GARNER MAGIC QUADRANT FOR DATA INTEGRATION TOOL

Talend named a Leader.

Get your copy

OPEN STUDIO FOR DATA INTEGRATION

Kickstart your first data integration and ETL projects.

Download now

What’s New for Talend Summer ’19

Watch the recorded webinar!

Watch Now

Put Massive Amounts of Data to Work

Learn how to make your data more available, reduce costs and cut your build time

Watch Now

How OTTO Utilizes Big Data to Deliver Personalized Experiences

Read about OTTO's experiences with Big Data and Personalized Experiences

Blog

Talend Integration with Databricks

Take a look at this video about Talend Integration with Databricks

Watch Now