Select multiple row from tUniqrow Component

One Star

Select multiple row from tUniqrow Component

Hello,
I have a situation where data is sorted based on some key and calculated the weighted sum.
e.g.
Line Key    WeightedSum
1     1        1
2     1        2
3     1        2
4     2        1
5     2        2

Now In output file I want to choose Line 2 & 3 against Key 1 and Line 5 for key 2. 

I tried tuniqrow after using sort but tuniqrow does not support multiple output. Any suggestion?

One Star

Re: Select multiple row from tUniqrow Component

Is there any other way to achieve the same. Using different set of components?
Seventeen Stars

Re: Select multiple row from tUniqrow Component

hi Ayan,

I don't understand your algorithm for the output.
why 2&3 for key 1 , just 5 for key 2 . what are you expecting (fonctionaly) exactly ?

regards
laurent
Ten Stars

Re: Select multiple row from tUniqrow Component

Is the algorithm meant to select all rows with the greatest WeightedSum for the key value? If so, you can do it like this. This is a little complicated, but is a really good example (the problem, not necessarily my explanation Smiley Happy ) of how a tMap can be used to compare values across rows.

1) Sort your data by Key and WeightedSum (in that order). Make sure that WeightedSum is ordered to be descending
2) Create a tMap and set 6 variables; lastKey, thisKey, lastWeightedSum, thisWeightedSum, firstWeightedSum and useRow. They must be in that order as variables are processed in the order in which they sit in the stack (top to bottom).
3) Connect your Key column to the "thisKey" variable and assign the value of "thisKey" to "lastKey". As "lastKey" is processed before "thisKey", in the first row it will be NULL. Variable values are stored between rows, so for the next row "lastKey" will hold the last Key value while "thisKey" will be given the current row value. 
4) Connect your WeightedSum column to the "thisWeightedSum" variable and assign the value of "thisWeightedSum" to "lastWeightedSum". As "lastWeightedSum" is processed before "thisWeightedSum", in the first row it will be NULL. Variable values are stored between rows, so for the next row "lastWeightedSum" will hold the last WeightedSum value while "thisWeightedSum" will be given the current row value.
5) For your "firstWeightedSum" variable, use the following logic.....

Var.lastKey==null ? Var.thisWeightedSum : Var.lastKey!= Var.thisKey ? Var.thisWeightedSum : Var.firstWeightedSum 

This basically works out what the highest WeightedSum is (only if the WeightedSum is ordered in descending order) for a Key. This holds that value for the lifetime of the Key. This is used with the "useRow" variable to identify rows to keep.
5) In the value expression for the "useRow" variable, do the following. This variable must be a boolean variable. This assumes that WeightedSum and Key are numbers....

Var.lastKey==null ? true : Var.lastKey!= Var.thisKey ? true : Var.firstWeightedSum==Var.thisWeightedSum ? true : false


This code uses the following logic....

a) If the "lastKey" is null, it must be the first row and as such we want this row (since the data is ordered by Key and WeightedSum, the first record is definitely required).
b) If the "lastKey" does not match the "thisKey", we know that the row is required. This indicates a new Key and the first record of a new Key group is always required.
c) If "thisKey" is equal to "lastKey" and "firstWeightedSum" is equal to "thisWeightedSum", we know that the record is required as the Key is the same as the last and shares the first (highest) WeightedSum.

You can use the value of the "useRow" variable to filter your output rows.
 

2019 GARNER MAGIC QUADRANT FOR DATA INTEGRATION TOOL

Talend named a Leader.

Get your copy

OPEN STUDIO FOR DATA INTEGRATION

Kickstart your first data integration and ETL projects.

Download now

What’s New for Talend Summer ’19

Watch the recorded webinar!

Watch Now

Best Practices for Using Context Variables with Talend – Part 4

Pick up some tips and tricks with Context Variables

Blog

How Media Organizations Achieved Success with Data Integration

Learn how media organizations have achieved success with Data Integration

Read

6 Ways to Start Utilizing Machine Learning with Amazon We Services and Talend

Look at6 ways to start utilizing Machine Learning with Amazon We Services and Talend

Blog