Row Multiplication

Community Manager

Row Multiplication

Sometimes we need to multiply rows based on a column in that row, multiple columns in that row, or by some other factor dependent on the row. In Talend there is a really nice mechanism that allows this to be done quite simply....but it is not very well known about. This mini tutorial shows how to use this functionality with a very basic example.

Lets say we have some data which has a numeric column indicating how many times that row should be multiplied.  For example....

1|1y0jujayeB|3
2|cPXkYSxhEG|5
3|Dzb0e8Xw8B|4
4|PAMKnXNjYY|1

.....and the end column indicates how many times that row should be multiplied. So for the above, we should get the below.....

1|1y0jujayeB|3
1|1y0jujayeB|3
1|1y0jujayeB|3
2|cPXkYSxhEG|5
2|cPXkYSxhEG|5
2|cPXkYSxhEG|5
2|cPXkYSxhEG|5
2|cPXkYSxhEG|5
3|Dzb0e8Xw8B|4
3|Dzb0e8Xw8B|4
3|Dzb0e8Xw8B|4
3|Dzb0e8Xw8B|4
4|PAMKnXNjYY|1

How can this be done?

The below screenshot shows you a basic job which enables this functionality.

MultiplyRowsJob.png

The Random Data component simply produces some random data in the format "id" (an integer), "someText" (a String), and "repeatRow" (an integer). The first thing we need to do is talk about the tMap component.

 

1) "tMap_1" (tMap)

Below you can see the config of this component. I will talk about each of the important points which I identified by the coloured circles in the screenshot.

MultiplyRowsJob1.png

This component has a single lookup which returns an "id". We will talk about this component next, but it should be noted that it will supply the "id" field. It can supply whatever you wish, but I have just chosen to get it to return the "id" supplied by the main row.

The lookup configuration is the important thing here. Notice the red oval around "Reload at each row". This is important and must be set. This causes the lookup to be re-run on every row.  

 

The blue oval around "Inner join" isn't actually terribly important for variations on this. But I'll be honest, I circled it so felt I had to say something about it :-) 

 

The orange oval around the two "globalMap Key" fields is very important. When you select "Reload at each row", the "globalMap" section reveals itself. Normally, this will be used to supply a value to a globalMap variable, then for that variable to be used in a SQL query which will be fired. Then the result of that will be passed into the tMap. In this case, we are going to use a bit of Java to generate some data. What we are doing here is supplying the "repeatRow" and "id" values for the row to the globalMap HashMap. This will be used in the "Row Multiplier" tJavaFlex component.

 

2) "Row Multiplier" (tJavaFlex)

This component has 3 sections; Start Code, Main Code and End Code. We will be using this component to perform a loop to be iterated over the number of times supplied by the "repeatRow" value. The 3 code sections are described below....


Start Code

// start part of your Java code

for(int i = 0; i<((Integer)globalMap.get("repeatRow")).intValue(); i++){

This section defines the For loop. This is based on the value from the "repeatRow" column supplied to the globalMap.

 

Main Code

// here is the main part of the component,
// a piece of code executed in the row
// loop
row2.id = ((Integer)globalMap.get("id")).intValue();

This section simply supplies the "id" to the "id" column (this component has been configured to output a single "id" column). Since this is inside the For loop, it will be repeated for as many times as the loop is fired. Here is where you might want to carry out any logic required in your row multiplication.

 

End Code

// end of the component, outside/closing the loop
}

The For loop is closed here.

 

Once this has been configured, you can run your job and see that the rows have been multiplied according to your required logic.

Community Manager

Re: Row Multiplication

Nice post, thanks!
----------------------------------------------------------
Talend | Data Agility for Modern Business
Five Stars

Re: Row Multiplication

Thanks for sharing this information and its too informative....

Nine Stars

Re: Row Multiplication

Hello,

 

thanks for the idea of using a tJavaFlex in a tMap.

 

Another way to produce the same result is to use the tNormalize: on large dataset, it might give better performance (by avoiding to reload the mapping on each row).

 

javaw_qiMoPkliiX.png

The tMap adds a temporary column that contains a String made of "n" characters (n being the number of repetition needed).

The tNormalize then repeat the line that number of time and the tFilterColumns remove that temporary column.

 

Here is the content of the new column in the tMap :

new String(new char[in1.mult - 1]).replace("\0", "x");
  • "in1.mult" is the column indicating how many times a row should be multiplied (minus one for the tNormalize).
  • "x" must be the same character used in the tNormalize

For example, the "1|1y0jujayeB|3" row will produce "1|1y0jujayeB|3|xx"

 

The tNormalize then uses this new column to produce the multiple rows (using "x" as separator on the newly created column). As a single "x" will produce two output rows, we need to substract one in the tMap.

With the same example "1|1y0jujayeB|3|xx", the tNormalize will produce 3 rows (one with the empty string before the first "x", one with the empty string between both "x" and one with the empty string after the second "x") :

"1|1y0jujayeB|3|"

"1|1y0jujayeB|3|"

"1|1y0jujayeB|3|"

The tFilterColumns simply remove the extra column at the end.

 

When working over hundreds of thousands of rows, the difference between both methods can be seen (while it'll also depends on the number of repetition).

2019 GARNER MAGIC QUADRANT FOR DATA INTEGRATION TOOL

Talend named a Leader.

Get your copy

OPEN STUDIO FOR DATA INTEGRATION

Kickstart your first data integration and ETL projects.

Download now

What’s New for Talend Summer ’19

Watch the recorded webinar!

Watch Now

Best Practices for Using Context Variables with Talend – Part 4

Pick up some tips and tricks with Context Variables

Blog

How Media Organizations Achieved Success with Data Integration

Learn how media organizations have achieved success with Data Integration

Read

6 Ways to Start Utilizing Machine Learning with Amazon We Services and Talend

Look at6 ways to start utilizing Machine Learning with Amazon We Services and Talend

Blog