my data set has only few duplicates (based on combination of business keys), that I need to aggregate (facts) before putting atomic data into DWH. I am using tAggregateRow for this and now I want to find out which rows were aggregated. Is here a way to do this?
Split your data before putting it through the tAggregator. I'm assuming you aggregate by a key (or keys). As long as the key(s) are kept, you can refer to the original rows. If you are looking for some sort of list of keys on your aggregated row, you can select the list option for this in the "Operations" section of the tAggregateRow
Hi.. Thank you for suggestion. How did you mean to split it?? I just tried tUniqRow an rejecting (duplicates) to other flow, which then proceeds with tAggregateRow. But this doesn't work.
When I said "split it", I meant to send the same data down different paths; 1 to aggregate and 1 to keep in it granular state. You can do with several components but a tMap is probably one you are more familiar with.
If you only need to know which rows were aggregated after the fact, append a column to your flow containing a 1, sum the new column when you perform your aggregate, and any row in the output where the value isn't 1 was aggregated.
Talend named a Leader.
Kickstart your first data integration and ETL projects.
Watch the recorded webinar!
Pick up some tips and tricks with Context Variables
Learn how media organizations have achieved success with Data Integration
Introduction to Talend Open Studio for Data Integration.