Change Data Capture, Slowly Changing Dimensions, Type 2

One Star

Change Data Capture, Slowly Changing Dimensions, Type 2

I am trying to figure out how I can implement change data capture and am interested in knowing whats the recommended/popular way of implementing change data capture for Kimball's Type 2 Dimensions in Talend. Most specifically I find looking up for valid dimensions, based on the dimension row's effective and expiration date, challenging.
Thanks for any replies!
Tags (1)
Employee

Re: Change Data Capture, Slowly Changing Dimensions, Type 2

I'm not sure I fully understand your question. fbonan, ccarbone and plegall (me) have designed the new tMysqlSCD and I've implemented this component (Perl only, in TOS 2.1.0).
tMysqlSCD supports Kimball's Type 2 SCD.
I've not worked yet on CDC questions, I don't know if we plan to mix SCD and CDC features.
One Star

Re: Change Data Capture, Slowly Changing Dimensions, Type 2

Hi plegall, Thank you for your reply. SCD is one of the techniques to handle CDC - and Kimball's SCD Type 2 is the technique I have used. However, in the Java implementation (2.1.0) using postgresql - is there a way to implement SCD2?.
I have tried implementing it 'manually' by doing target table lookups and worked out well for loading dimensions. I am having trouble looking up valid SCD2 dimension rows when loading the fact table - because the valid row in a Type 2 dimension is based on the dimension row's effective and expiration date. Any ideas about how I can work around this limitation of tmap, to do only equals comparison vs the need for comparing the incoming row's change date to be between dimension's effective and expiration date, for loading the fact table.
Is there an advantage to keep Talend's SCD implementation database dependent?
Is there a web page where I can see the SCD feature development road map.
Thanks again,
One Star

Re: Change Data Capture, Slowly Changing Dimensions, Type 2

A Java / MySQL implementation of SCD type 1 / type 2 will be available very soon for download in the ecosystem (other databases will follow).
The main benefit to keep SCD implementation database specific is to prevent overloaded components, hard to maintain (each database has specific connection strings, escape characters...).
Edit: the component is available here : http://www.talendforge.org/ext/revision_view.php?rid=14