Change Data Capture, Slowly Changing Dimensions, Type 2

One Star

Change Data Capture, Slowly Changing Dimensions, Type 2

I am trying to figure out how I can implement change data capture and am interested in knowing whats the recommended/popular way of implementing change data capture for Kimball's Type 2 Dimensions in Talend. Most specifically I find looking up for valid dimensions, based on the dimension row's effective and expiration date, challenging.
Thanks for any replies!
Employee

Re: Change Data Capture, Slowly Changing Dimensions, Type 2

I'm not sure I fully understand your question. fbonan, ccarbone and plegall (me) have designed the new tMysqlSCD and I've implemented this component (Perl only, in TOS 2.1.0).
tMysqlSCD supports Kimball's Type 2 SCD.
I've not worked yet on CDC questions, I don't know if we plan to mix SCD and CDC features.
One Star

Re: Change Data Capture, Slowly Changing Dimensions, Type 2

Hi plegall, Thank you for your reply. SCD is one of the techniques to handle CDC - and Kimball's SCD Type 2 is the technique I have used. However, in the Java implementation (2.1.0) using postgresql - is there a way to implement SCD2?.
I have tried implementing it 'manually' by doing target table lookups and worked out well for loading dimensions. I am having trouble looking up valid SCD2 dimension rows when loading the fact table - because the valid row in a Type 2 dimension is based on the dimension row's effective and expiration date. Any ideas about how I can work around this limitation of tmap, to do only equals comparison vs the need for comparing the incoming row's change date to be between dimension's effective and expiration date, for loading the fact table.
Is there an advantage to keep Talend's SCD implementation database dependent?
Is there a web page where I can see the SCD feature development road map.
Thanks again,
Highlighted
One Star

Re: Change Data Capture, Slowly Changing Dimensions, Type 2

A Java / MySQL implementation of SCD type 1 / type 2 will be available very soon for download in the ecosystem (other databases will follow).
The main benefit to keep SCD implementation database specific is to prevent overloaded components, hard to maintain (each database has specific connection strings, escape characters...).
Edit: the component is available here : http://www.talendforge.org/ext/revision_view.php?rid=14

2019 GARNER MAGIC QUADRANT FOR DATA INTEGRATION TOOL

Talend named a Leader.

Get your copy

OPEN STUDIO FOR DATA INTEGRATION

Kickstart your first data integration and ETL projects.

Download now

What’s New for Talend Summer ’19

Watch the recorded webinar!

Watch Now

Best Practices for Using Context Variables with Talend – Part 1

Learn how to do cool things with Context Variables

Blog

Migrate Data from one Database to another with one Job using the Dynamic Schema

Find out how to migrate from one database to another using the Dynamic schema

Blog

Best Practices for Using Context Variables with Talend – Part 4

Pick up some tips and tricks with Context Variables

Blog