Implementation of SQL Analytical function like Row_Number(), Rank()

Highlighted
One Star

Implementation of SQL Analytical function like Row_Number(), Rank()

Hi
I want to implement SQL analytical function using Talend like 
ROW_NUMBER() OVER(PARTITION BY Emp_ID ORDER BY Emp_ID)

RANK() OVER(PARTITION BY Emp_ID ORDER BY Emp_ID).

I tried to use Numeric.Sequence but it has some issues and it is there with Sequence(Date, int, int) signature.
I have a pipe delimited flat file and created metadata and want to use metadata while working with analytical functions
How we can implement these Analytical function using Talend?
Thanks
Highlighted
Community Manager

Re: Implementation of SQL Analytical function like Row_Number(), Rank()

This can be done using a tSortRow, a tMap and Numeric.sequence. 
First of all, Numeric.sequence should have the signature "sequence(String seqName, int startValue, int step)". You will need to set the seqName value as a variable that changes according to your grouping (see "Partition By" in your description). 
To get this working, first you will need to use tSortRow to order your data by your "Partition By" requirements. Then (in the next component, the tMap) you will need to keep a track of changes to "Partition By" columns. When the group changes you will need to change the "seqName" value. This will start a new sequence.
You can check for changes between rows in a tMap following the tutorial below....
https://www.rilhia.com/quicktips/quick-tip-compare-row-value-against-value-previous-row
Highlighted
One Star

Re: Implementation of SQL Analytical function like Row_Number(), Rank()

Hi
I was trying to use sequence function but the signature for me its (date, int, int). PF screen shot below


I also tried to change it to (string, int, int) but it does not allow any editing. 
This sequence function generates sequence based on string. Can we generate sequence based on Date or Int (Apart from string datatype)?
Thanks..
Highlighted
Community Manager

Re: Implementation of SQL Analytical function like Row_Number(), Rank()

The code for sequence should be....
    /**
* return an incremented numeric id
*
* {talendTypes} int | Integer
*
* {Category} Numeric
*
* {param} string("s1") sequence identifier
*
* {param} int(1) start value
*
* {param} int(1) step
*
* {example} sequence("s1", 1, 1) # 1, 2, 3, ...
*
* {example} sequence("s2", 100, -2) # 100, 98, 96, ...
*
*/
public static Integer sequence(String seqName, int startValue, int step) {
if (seq_Hash.containsKey(seqName)) {
seq_Hash.put(seqName, seq_Hash.get(seqName) + step);
return seq_Hash.get(seqName);
} else {
seq_Hash.put(seqName, startValue);
return startValue;
}
}

....I don't have 6.3.1 on my machine at the moment, but can only suggest that either this is a bug or a new overloaded version of sequence. Check to see if the old version still exists. If it does not, you can copy the code above and put it into your own routine.
Highlighted
One Star

Re: Implementation of SQL Analytical function like Row_Number(), Rank()

Thanks, it worked. Can we create sequence for other data type as well?
Regards..
Highlighted
Community Manager

Re: Implementation of SQL Analytical function like Row_Number(), Rank()

The beauty of Talend is that you can create whatever method/function you want in Java. You can also make use of other people's functionality in third party APIs. So the answer is "yes". 
Highlighted
One Star

Re: Implementation of SQL Analytical function like Row_Number(), Rank()

True. We can write our own libraries based on the requirements. I stuck in one scenario, I want to apply row_number once the data get loaded into the component and based on the row_number, I have to do some filtering. Is it possible? Attaching workflow with this...

Highlighted
Community Manager

Re: Implementation of SQL Analytical function like Row_Number(), Rank()

You can do this using the tMap. Do it either in an "out" table column or using a tMap variable.
Highlighted
Six Stars

Re: Implementation of SQL Analytical function like Row_Number(), Rank()

Morning Rhall_2_0.  the link you provided below is a dead link.  Do you have one that works?  I'd like to see the solution that you provided.  I'm almost there but not quite...I'm trying to do:

partition data by Field1

order the data within partition by Field2

get the row_num()

 

thanks

Brad

2019 GARTNER MAGIC QUADRANT FOR DATA INTEGRATION TOOL

Talend named a Leader.

Get your copy

OPEN STUDIO FOR DATA INTEGRATION

Kickstart your first data integration and ETL projects.

Download now

Best Practices for Using Context Variables with Talend – Part 2

Part 2 of a series on Context Variables

Blog

Best Practices for Using Context Variables with Talend – Part 1

Learn how to do cool things with Context Variables

Blog

Migrate Data from one Database to another with one Job using the Dynamic Schema

Find out how to migrate from one database to another using the Dynamic schema

Blog