We have a requirement where for every entry in table A, we will do a full left join on table B and aggregate a value together based on the all the entries in table B into a record in table C. The aggregation is not a simple addition, but for sample purposes this can stand in for it. At the end state table C should have an equal number of row to table A with the same primary key, and a value column based on a value combined from all the rows in table B. To accomplish this efficiently, we are attempting to pre-build the entire table C as a hash, and then write it out just the once. To preform the aggregation we are feeding in the Hash Input to a map that then adds the values to the Hash Output. Problem is, we are not getting (just) the aggregated values:
If we select in the Hash Output "keep all" then while we do get the aggregated final values out, we also get all the intermediate value rows too, with no clear way to get just the last updated (and final sum) one. If we select "keep first" then we get the correct number of rows, but just the very first values, which is not helpful in this case.
What would seem like it would solve this problem, is if there was a "keep last" option, but we don't see that. Below is a sample job we've created to test the situation, but have not gotten it to provide the desired result (that being 1 record for each uniquely keyed record in the source but with the sum of all the previous associated values). Is there some way to simulate or get the effect of "keep last"? Or is there some combination for updating the records by key in the hash (not inserting new ones) that we have not realized? Or would some other component group do this better than hash? Thanks for your attention.
Thanks for the suggestion djdeejay. I don't believe that will work as the values from table B that are being computed are relative to table A, so table B can't be just read separately. However, I would be happy to be wrong about that... Simplified Illustration: input Table A: 3 rows single column (ID) each, values 1, 2, 3 input Table B: 4 rows, single column (Value) each, values 2, 4, 6, 8 Values in each row of result table should be sum all the products of the ID of the row x the each of the values. e.g. row 2's value would be 2 * 2 + 2 * 4 + 2* 6 + 2 * 8 result Table C: 3 rows of two columns (ID, Values), values: (1, 20), (2, 40), (3, 60) I am frankly baffled at this point that the hash functions really seem to much more stream focused than table focused, i.e. there does not appear to be a way to update the values there based on a key column. Any ideas appreciated!
OK, I am not going to claim I fully understand what you are going for here (an example with the inputs and out would help with this), but I can certainly help you with one of the initial issues you lighted. This one... "If we select in the Hash Output "keep all" then while we do get the aggregated final values out, we also get all the intermediate value rows too, with no clear way to get just the last updated (and final sum) one." If you are getting the aggregated values out along with intermediate values, you could try using the tAggregateRow component with your key column as the group key. Then select the "Last" function for each of the columns. So long as you can guarantee that the actual total amount is the last value per group (as would be expected), this should work.
Thanks much rhall, that works, at least in the small test scenario. We did look briefly at the AggregateRow widget to help with this, but were not aware of how the 'last' function worked, and it does indeed do the trick! The actual job is building up a large binary bit map per 'Deal' (table A) where each bit represents whether a 'User' (table B) has the right to see that row in their grid. Its a pain in the rear end and I don't like it, but we need a way to speed up getting the filtered list as calling the function that checks the ACL and runs the business logic was running a bit too slow. We've already implemented a different work around for generating the bitmap that instead of a Hash is building up the bitmap in a global variable; will look and see if getting the Hash back in will help further, but regardless of what we do in this case I think that - with your fix - this is a useful tool to have for the future. I would still recommend - not sure where to do so - that the HashOutput widget should be augmented by Talend to have a "Last Only" option. Sincerely, JD