Five Stars

how to find the consecutive integers in a column?

Hi all,

 

I have a csv file which contains a column called tokenID. The tokenID column is shown in the pic1. And I want to transform it to what it is in pic2. All the consecutive integers would be assigned the value which is the last integer in the set of consecutive integers. Does anyone know how to do the transformation in Talend or in Java code. Thanks a lot.

pic1pic1pic2pic2

  • Dataprep
1 ACCEPTED SOLUTION

Accepted Solutions
Nine Stars

Re: how to find the consecutive integers in a column?

My approach would be to add a column that contains a counter which only increments when the difference between the current value and the prior is more than 1. (You don't say how to handle adjacent identical values, if any, but I'll assume they can be grouped with the consecutive values.)

tMemorizeRows (https://www.talendbyexample.com/talend-tmemorizerows-component-reference.html) or tMap variables (https://www.rilhia.com/quicktips/quick-tip-compare-row-value-against-value-previous-row) can allow you to compare current values to prior values.

Done right, this should give each group of consecutive values its own key. A tAggregateRows component can find the maximum value in each group. Joining the list of max values to the original set will let you pair up each value with its group maximum.

I'm pecking this out on my phone, otherwise I'd do a proof of concept Job with screenshots and formula text, but the links above (all credit to their authors) should give you a good start. If you need help with any specifics, write back and I'll go into more detail when I'm back at my desk.
5 REPLIES
Nine Stars

Re: how to find the consecutive integers in a column?

Is the list of IDs always sorted or are you looking for consecutive numbers within an otherwise unordered list?

If sorted, does the original order matter?
Five Stars

Re: how to find the consecutive integers in a column?

Yes, the list of IDs is sorted, and I do not want to change the order. Thanks!

Nine Stars

Re: how to find the consecutive integers in a column?

My approach would be to add a column that contains a counter which only increments when the difference between the current value and the prior is more than 1. (You don't say how to handle adjacent identical values, if any, but I'll assume they can be grouped with the consecutive values.)

tMemorizeRows (https://www.talendbyexample.com/talend-tmemorizerows-component-reference.html) or tMap variables (https://www.rilhia.com/quicktips/quick-tip-compare-row-value-against-value-previous-row) can allow you to compare current values to prior values.

Done right, this should give each group of consecutive values its own key. A tAggregateRows component can find the maximum value in each group. Joining the list of max values to the original set will let you pair up each value with its group maximum.

I'm pecking this out on my phone, otherwise I'd do a proof of concept Job with screenshots and formula text, but the links above (all credit to their authors) should give you a good start. If you need help with any specifics, write back and I'll go into more detail when I'm back at my desk.
Nine Stars

Re: how to find the consecutive integers in a column?

If the list order is easy to restore by sorting it again, then I'd reverse the sort, and instead of a counter to group consecutive values, I'd use tMap variables to keep track of the current maximum value, only changing it when the difference between the current and prior values is greater than 1. This should let you get the maximum group value in one pass. Then you could sort the values back to their original order.
Five Stars

Re: how to find the consecutive integers in a column?

Thank you for your great idea.  I will try those components.