Implementation of SQL Analytical function like Row_Number(), Rank()

One Star

Implementation of SQL Analytical function like Row_Number(), Rank()

Hi
I want to implement SQL analytical function using Talend like 
ROW_NUMBER() OVER(PARTITION BY Emp_ID ORDER BY Emp_ID)

RANK() OVER(PARTITION BY Emp_ID ORDER BY Emp_ID).

I tried to use Numeric.Sequence but it has some issues and it is there with Sequence(Date, int, int) signature.
I have a pipe delimited flat file and created metadata and want to use metadata while working with analytical functions
How we can implement these Analytical function using Talend?
Thanks
Sixteen Stars

Re: Implementation of SQL Analytical function like Row_Number(), Rank()

This can be done using a tSortRow, a tMap and Numeric.sequence. 
First of all, Numeric.sequence should have the signature "sequence(String seqName, int startValue, int step)". You will need to set the seqName value as a variable that changes according to your grouping (see "Partition By" in your description). 
To get this working, first you will need to use tSortRow to order your data by your "Partition By" requirements. Then (in the next component, the tMap) you will need to keep a track of changes to "Partition By" columns. When the group changes you will need to change the "seqName" value. This will start a new sequence.
You can check for changes between rows in a tMap following the tutorial below....
https://www.rilhia.com/quicktips/quick-tip-compare-row-value-against-value-previous-row
One Star

Re: Implementation of SQL Analytical function like Row_Number(), Rank()

Hi
I was trying to use sequence function but the signature for me its (date, int, int). PF screen shot below


I also tried to change it to (string, int, int) but it does not allow any editing. 
This sequence function generates sequence based on string. Can we generate sequence based on Date or Int (Apart from string datatype)?
Thanks..
Sixteen Stars

Re: Implementation of SQL Analytical function like Row_Number(), Rank()

The code for sequence should be....
    /**
* return an incremented numeric id
*
* {talendTypes} int | Integer
*
* {Category} Numeric
*
* {param} string("s1") sequence identifier
*
* {param} int(1) start value
*
* {param} int(1) step
*
* {example} sequence("s1", 1, 1) # 1, 2, 3, ...
*
* {example} sequence("s2", 100, -2) # 100, 98, 96, ...
*
*/
public static Integer sequence(String seqName, int startValue, int step) {
if (seq_Hash.containsKey(seqName)) {
seq_Hash.put(seqName, seq_Hash.get(seqName) + step);
return seq_Hash.get(seqName);
} else {
seq_Hash.put(seqName, startValue);
return startValue;
}
}

....I don't have 6.3.1 on my machine at the moment, but can only suggest that either this is a bug or a new overloaded version of sequence. Check to see if the old version still exists. If it does not, you can copy the code above and put it into your own routine.
One Star

Re: Implementation of SQL Analytical function like Row_Number(), Rank()

Thanks, it worked. Can we create sequence for other data type as well?
Regards..
Sixteen Stars

Re: Implementation of SQL Analytical function like Row_Number(), Rank()

The beauty of Talend is that you can create whatever method/function you want in Java. You can also make use of other people's functionality in third party APIs. So the answer is "yes". 
One Star

Re: Implementation of SQL Analytical function like Row_Number(), Rank()

True. We can write our own libraries based on the requirements. I stuck in one scenario, I want to apply row_number once the data get loaded into the component and based on the row_number, I have to do some filtering. Is it possible? Attaching workflow with this...

Sixteen Stars

Re: Implementation of SQL Analytical function like Row_Number(), Rank()

You can do this using the tMap. Do it either in an "out" table column or using a tMap variable.
Four Stars

Re: Implementation of SQL Analytical function like Row_Number(), Rank()

Morning Rhall_2_0.  the link you provided below is a dead link.  Do you have one that works?  I'd like to see the solution that you provided.  I'm almost there but not quite...I'm trying to do:

partition data by Field1

order the data within partition by Field2

get the row_num()

 

thanks

Brad