Five Stars

[resolved] Get the most recent date using tAggregateRow

Hi,
I am trying to extract the most recent date from a flow and I need some support.
In this job getting all license history for a based on the idlicense (see tmap screenshot) and then I want to keep only the most recent date using a tAggregateRow. I've been playing with taggregaterow and so far only succeed to have partial solution.
SOLUTION 1:
Set taggregate with:
Group by: license_idlicense
Operation:
- Output column: Start_Date
- Function: Max
- Input column position: Start_Date
When I do this, the aggregation works great and I have only the max date in the output. However, only value for license_idlicense and the date is send in the output. Even if I think I set the schema properly (see taggregaterow_schema screenshot)
SOLUTION 2:
Same settings as describe before with one exception: I added all the field in the group by of the aggregaterow (see taggregate screenshot). This time I have all the data I want in the output but the aggregation doesn't work anymore

I've been looking around for tutorials and / or previous post to use tAggregateRow with date with no success.
Thanks in advance for your support!
1 ACCEPTED SOLUTION

Accepted Solutions
Seven Stars

Re: [resolved] Get the most recent date using tAggregateRow

If you want get the record with the max start date for each license_idlicense, the best way is to sort the flow by license_idlicense and descending start date and then use tUniqRow keyed on license_idlicense to drop all rows after the first for each license_idlicense.
7 REPLIES
Moderator

Re: [resolved] Get the most recent date using tAggregateRow

Hi,
Could you please elaborate your case with an example with input and expected output values? Perhaps we can design a demo job with tAggregateRow
Best regards
Sabrina
--
Don't forget to give kudos when a reply is helpful and click Accept the solution when you think you're good with it.
Community Manager

Re: [resolved] Get the most recent date using tAggregateRow

Hi
You did't use tAggregateRow correctly with solution 2, in your case, the group element should license_idlicense as you did in solution1. The max operation will only count one value, the max value, it is normal in solution1.

Shong
----------------------------------------------------------
Talend | Data Agility for Modern Business
Five Stars

Re: [resolved] Get the most recent date using tAggregateRow

Hi shong and thanks for your message, you confirmed my thought regarding usage of tAggregateRow.
So I configure it with only (see screen shot)
- Group by on license_idlicense
- Operation on Start_Date
- sync columns to transfer the schema
... and my output contains only value for the two fields mentioned previously. This is an extract of the tLogRow:
Starting job L031_test at 12:10 23/05/2013.

connecting to socket on port 3818
connected
|0||01-06-2003||||461
|0||01-08-2007||||464
|0||01-05-2004||||457
|0||01-01-1900||||943
|0||01-12-1997||||931
|0||01-08-1989||||456
|0||01-01-1900||||934
|0||01-08-1988||||451
|0||01-10-1986||||933
disconnected

what should I do to get value for all fields in the outputs stream?
thanks
Seven Stars

Re: [resolved] Get the most recent date using tAggregateRow

How can you expect to carry data through if you're not grouping by it or doing some operation on it? This just like SQL.
Five Stars

Re: [resolved] Get the most recent date using tAggregateRow

Using the example set previously, I thought the tAggregateRow will
1. group records by license_idlicense
2. extract the full record where the start_date is the max and not only the start_date value
So if I get your answer right alevy, I should add a tmap after my tAggregateRow to retrieve other value for each record. I though about this but I was looking for a more straightforward solution.
thanks for your support.
Seven Stars

Re: [resolved] Get the most recent date using tAggregateRow

If you want get the record with the max start date for each license_idlicense, the best way is to sort the flow by license_idlicense and descending start date and then use tUniqRow keyed on license_idlicense to drop all rows after the first for each license_idlicense.
Five Stars

Re: [resolved] Get the most recent date using tAggregateRow

alevy, thanks for your answer.
I added a tmap after the tAggregateRow to get back all data I need, but I keep your process in the back of my mind for next time!