my data set has only few duplicates (based on combination of business keys), that I need to aggregate (facts) before putting atomic data into DWH. I am using tAggregateRow for this and now I want to find out which rows were aggregated. Is here a way to do this?
Split your data before putting it through the tAggregator. I'm assuming you aggregate by a key (or keys). As long as the key(s) are kept, you can refer to the original rows. If you are looking for some sort of list of keys on your aggregated row, you can select the list option for this in the "Operations" section of the tAggregateRow
Hi.. Thank you for suggestion. How did you mean to split it?? I just tried tUniqRow an rejecting (duplicates) to other flow, which then proceeds with tAggregateRow. But this doesn't work.
If you only need to know which rows were aggregated after the fact, append a column to your flow containing a 1, sum the new column when you perform your aggregate, and any row in the output where the value isn't 1 was aggregated.