Below is an example of what I am trying to accomplish in Talend and am interested in knowing how it can be done: To those familiar with Kimball data marts, the question is ? how can I look up the surrogate key from a dimension table (that uses SCD2) for loading data into the fact table. Example/Description: I have a customer dimension that is populated using SCD2 for address columns. A customer changes her address on 10/10/2007. This would mean that the currently active record (with old address) would be 'terminated' on 10/10/2007 and 1 new row (with new address) would be inserted into the customer dimension with an effective date of '10/10/2007. Now, If I am loading transactions (into the fact table) for the month of October 2007 into the data mart. I want the transactions before 10/10/2007 to use the customer surrogate key based on the old address and the transactions after the address change to use the ?new? customer surrogate key that corresponds to the new address. To be able to get the right customer surrogate key based on the transaction date/time, I would need a way to look up the surrogate key where 1. The business key from stream (customer_id) matches the business key in the dimension AND 2. The transaction date/time (from the source) is between the dimension record's effective and expire date. So, my question was - what would be the best way to do the above surrogate key look up to make sure that appropriate surrogate key from a dimension is loaded into the fact table. Please let me know if there is any additional information I can provide. Thanks!
Thank you Stephane - Your technique will let me achieve what I need to. In terms of functionality - It would be great to have a component that does such a comparison and look up by processing on the database and eliminating the need to 1. Stream all dimension rows over the network. If mapper waits for all rows to load before starting processing - it could be a major bottleneck especially if a small percentage of dimension rows are needed for loading fact table And 2. The need to store them in memory for mapper. If the dimension is big - talend might write to disk - increasing latency. Thanks
You can red the dimension row by row. In the query of the lookup component use the globalMap entries to filter the datasets. The lookup inout component should use a separate connection to prevent the component to establish for every lookup a new connection.