my data set has only few duplicates (based on combination of business keys), that I need to aggregate (facts) before putting atomic data into DWH. I am using tAggregateRow for this and now I want to find out which rows were aggregated. Is here a way to do this?
Split your data before putting it through the tAggregator. I'm assuming you aggregate by a key (or keys). As long as the key(s) are kept, you can refer to the original rows. If you are looking for some sort of list of keys on your aggregated row, you can select the list option for this in the "Operations" section of the tAggregateRow
Hi.. Thank you for suggestion. How did you mean to split it?? I just tried tUniqRow an rejecting (duplicates) to other flow, which then proceeds with tAggregateRow. But this doesn't work.
When I said "split it", I meant to send the same data down different paths; 1 to aggregate and 1 to keep in it granular state. You can do with several components but a tMap is probably one you are more familiar with.
If you only need to know which rows were aggregated after the fact, append a column to your flow containing a 1, sum the new column when you perform your aggregate, and any row in the output where the value isn't 1 was aggregated.