One Star

tAggregateSortedRow ignoring last grouping

I'm using tAggregateSortedRow to dedup a selection on one key. (tUniqRow won't work because I need the last row instead of the first) The component seems to be ignoring the last key group entirely. For example, with the following input data, I get only two rows output:
a,1
a,2
a,3
a,4
b,5
b,6
c,7
c,8
c,9
c,0
The output is:
a,4
b,6
I'm using 2.3.0 Java on WinXP 32bit. Any ideas?
Chris.
13 REPLIES
Community Manager

Re: tAggregateSortedRow ignoring last grouping

Hi
It is a 2511. but the bug isn't fixed yet.
To solve this issue, you can add a null row as the last row in your input data to get the right result.
Best regards
shong
----------------------------------------------------------
Talend | Data Agility for Modern Business
One Star

Re: tAggregateSortedRow ignoring last grouping

I am new to TOS and had the same issue. It is nice that there is a warning but as a newbie I've a question on this.
Should I use a NULL row file and use tUnite anytime I want to use this object? Will it cause any unintended issues? I think it is better than using the tExternalSortedRow as I coulnd not get it to work.
Thanks in advance.
Sean
Community Manager

Re: tAggregateSortedRow ignoring last grouping

Hi Sean
Should I use a NULL row file and use tUnite anytime I want to use this object? Will it cause any unintended issues?

Currently, tAggregateSortedRow always ignore the last groupin(of course, it is a bug and we will fix it). So now, if you want to use this componennt to get a right result, you should add a null row as the last row.
Best regards
shong
----------------------------------------------------------
Talend | Data Agility for Modern Business
Employee

Re: tAggregateSortedRow ignoring last grouping

Currently, tAggregateSortedRow always ignore the last groupin(of course, it is a bug and we will fix it).

I'm not so sure about this :-). See my notes in 503 and 2511
One Star

Re: tAggregateSortedRow ignoring last grouping

plegall:
I tried to use a NULL file combined with tUnite. It did not work. May be because I am not sure what a NULL row is in a flat file context. Also the Word doc in one of the bug trackers was difficult for me to follow (probably due to my newbie status with TOS).
In any case, tASR makes sense if the tSort is much better at sorting files than aggregate component (at this time I do not have any DB inputs but just excel and flat files). Else just regular aggregate can do the same job of Sort + tASR.
I am from Informatica world where SORT component is very good compared to AGGREGATE and as such tend to use SORT and mark the AGGREGATE as "Sorted Input" which is same as using tASR.
If the bugtracker on this is closed, I think it should be opened, right?
Thank you for your help here and with other issues.
Regards,
Sean
One Star

Re: tAggregateSortedRow ignoring last grouping

I just run into this issue. Do you have any feedback for which release you are planning the fix?
Employee

Re: tAggregateSortedRow ignoring last grouping

I just run into this issue. Do you have any feedback for which release you are planning the fix?

As you can read in 2511, slanglois has added the solution I suggested, ie a new property "expected number of rows". He has commited his code a very short time ago and the solution will be available in next milestone (not in next main release, not in 2.3.2 but in 2.4.0M1). The solution was implemented only for the Java component, not the Perl component yet.
This solution is not a smart solution, but we currently have no other smarter solution. Any suggestion is welcomed.
One Star

Re: tAggregateSortedRow ignoring last grouping

Wow... I have run into this bug in 4.1.0, and it is over two years later. Is this ever going to be fixed?
I can't imagine a more important component!
One Star

Re: tAggregateSortedRow ignoring last grouping

Good morning, gentlement from the past.
I come from 2012 where we have TOS 5.0.1 to let you know that these promises are empty and this incredibly simple bug still hasn't been fixed.
There are simple ways to work around it by using tAggregateRow and, for cases of selection aggregate functions such as "first" or "max", you can use tSortRow with tUniqRow, so there's nothing vital about this component.
It is, however, a shameful blunder to keep such a harmful bug available to the uninformed user. My co-worker lost a week fruitlessly on this thinking her selects were wrong, and I lost two days until I got suspicious of the component.
One Star

Re: tAggregateSortedRow ignoring last grouping

Anyone know if it's been fixed in 5.1.1?
Of course if tSortRow exposed the number of rows it has processed then it would make using tASR a lot more useful!
One Star

Re: tAggregateSortedRow ignoring last grouping

Ekevoo, I can tell you, from 2.5 years into your future, that it will not be solved for a long time. not without a big overhaul of talend technical design.
The problem is inherent to the sliding window approach tASR uses.
Since the tASR component gets sorted input, it can compare subsequent rows' keys to decide if it needs to keep aggregating:
LOOP
IF key of aggregate == key of incoming row THEN keep aggregating (and don't output any row)
ELSE output aggregate as new row, reinitialize aggregate with incoming row
When the loop ends, there will be an aggregate left that hasn't been output yet, corresponding to the last key in the sorted input. This remainder can be output just after the loop ends, but of course by that time it will be useless, since the following components' processing is already finished - it's contained in the same loop.
Conclusion: the last aggregate is always lost, there is no way around this. This is an inherent consequence of the processing model.
Why would tASR ever be relevant if there is also a tAR? Well, in the real world we can't ignore the physical limits of memory. tAR must load all input before being able to aggregate. Imagine aggregating over a collection of 10 million records of 50 columns, when there are never more than 5 rows in any aggregate.
There is a much cheaper and simpler way to aggregate such large datasets, if they are organised usefully (ie sorted).
If, like, me, you are in a situation where memory conservation is vital due to the large scale of the data being processed, the bug is not irrelevant, and the component is definitely not 'non vital'.
I find it astonishing that talend support can suggest to "add a dummy row at the end of the flow" with a straight face - talend is a code generator - it can generate code to add a dummy row for each aggregate component if that is really the only way to solve this.
I would expect a talend bug to be solved in the talend code, rather than in my process or data. That is not an unreasonable expectation is it?
Seventeen Stars

Re: tAggregateSortedRow ignoring last grouping

I agree with you!

Re: tAggregateSortedRow ignoring last grouping

Awentzler, I completely agree with you. Stream-like behavior when performing set-processing is absolutely vital. This bug is a potential deal-breaker for most of our customers.
Is there no flush signal being passed through the flow when the input component has issued its final row? There should be, so the tASR (and all other components for that matter) gets a chance to clean up before shutting down.
Cheers,