Six Stars

[resolved] Leveraging MS SQL Server CDC/Replication

I understand Talend has come up with their own trigger based CDC components and implementation, but what about using the built in CDC/replication methods for specific databases, in my case, MS SQL Server?  I do not know any DBA that would allow production tables to implement any type of shadow table or trigger based implementation on a high transaction ERP or CRM.  I would never suggest that either.  The benefit of using native log-based CDC on databases is that it does not impact the production table in use nor does it slow down performance.  Having suggested CDC/replication to my clients (who have bought Talend), I did not even think that Talend may not support this (it is, after all, a Data Integration tool).  The clients have quite a few MS SQL Server source systems and have option of setting up replication through publications or CDC.  SQL Server publications are not exactly straight forward like Oracle CDC (pre-12C, although I have worked with Golden Gate as well) with physical Change tables in an actual schema.  They have "articles" and several other system tables necessary to process changes and commit "checkpoints" of read data.  I have not worked with an ETL tool that does not contain this native functionality and now that I am putting together a Talend POC for the client, I am starting to get nervous that this does not exist.  Someone please correct me if I am wrong.

Thanks,
Scott
1 ACCEPTED SOLUTION

Accepted Solutions
Six Stars

Re: [resolved] Leveraging MS SQL Server CDC/Replication

Thanks Shong, that is how I understood it as well.  The point is, most DBA's deplore triggers due to their performance impact and using triggers for replication across a wide table does not make sense in a highly transactional production database.  I am not a DBA, but have worked with enough to know that triggers are not an option in 95% of any DI solution.  So essentially, Talend isn't providing a solution.  I am currently developing two custom components for MS SQL CDC (input and output), but I am bit disappointment this has not been provided "out-of-the-box" with the paid subscription.   I was hoping to utilize some of the components as bases, but even in Talend's CDC solution, you do not provide an output that maintains transactional integrity.  Every example I have seen splits up the inserts, updates and deletes in a tmap and handles them in three separate outputs.  Anyone who has worked with change data knows this will not work because of transactional sequential integrity.  The changes have to be handled at the target in the order they happened at the source (think replication).

Thanks,
Scott
EDIT:  Found an article that discussing the impact of triggers.  It's a good read and confirms what every DBA has told me:
What is the Real Performance Impact of Triggers?
6 REPLIES
Community Manager

Re: [resolved] Leveraging MS SQL Server CDC/Replication

Hi 
The Talend CDC components for MS SQL server is based on triggers, the DB user should have right for creating triggers in database. This functionality capture the changes and synchronize the changes to a variety of target systems.
Regards
Shong
----------------------------------------------------------
Talend | Data Agility for Modern Business
Six Stars

Re: [resolved] Leveraging MS SQL Server CDC/Replication

Thanks Shong, that is how I understood it as well.  The point is, most DBA's deplore triggers due to their performance impact and using triggers for replication across a wide table does not make sense in a highly transactional production database.  I am not a DBA, but have worked with enough to know that triggers are not an option in 95% of any DI solution.  So essentially, Talend isn't providing a solution.  I am currently developing two custom components for MS SQL CDC (input and output), but I am bit disappointment this has not been provided "out-of-the-box" with the paid subscription.   I was hoping to utilize some of the components as bases, but even in Talend's CDC solution, you do not provide an output that maintains transactional integrity.  Every example I have seen splits up the inserts, updates and deletes in a tmap and handles them in three separate outputs.  Anyone who has worked with change data knows this will not work because of transactional sequential integrity.  The changes have to be handled at the target in the order they happened at the source (think replication).

Thanks,
Scott
EDIT:  Found an article that discussing the impact of triggers.  It's a good read and confirms what every DBA has told me:
What is the Real Performance Impact of Triggers?
Fifteen Stars

Re: [resolved] Leveraging MS SQL Server CDC/Replication

I understand that you do not want to use the Talend CDC functionality, but I am not sure why you cannot use the SQL Server built-in functionality for this? What exactly is stopping you from doing this? 
Rilhia Solutions
Six Stars

Re: [resolved] Leveraging MS SQL Server CDC/Replication

rhall,
In short, nothing, but components are not provided to do this with SQL Server that I am aware of.  I will need to implement the functionality myself.  There is still not an output that will commit transactions in the order they are received based on the operation type, which is critical functionality.  Again, it's not that it can't be done, but the initial post was asking if these types of components existed already for MS SQL Server (It appears that it is available for Oracle using xstreams). 
Thanks,
Scott
Fifteen Stars

Re: [resolved] Leveraging MS SQL Server CDC/Replication

Oh I see what you are saying. Check the Talend Exchange and you *may* find suitable components for this. One big advantage of Talend is that if the functionality is not available out of the box you can build it yourself or borrow from someone else. 
Rilhia Solutions
Six Stars

Re: [resolved] Leveraging MS SQL Server CDC/Replication

Unfortunately no components existed on the exchange, but we did end up creating our own component and we are about to put our project using it into UAT.  It works as expected.  We did have a couple tables that did not have either a PK or a unique index defined for the table, so we had to use the "all" changes and iterate our flow so we could sequentially apply the operations to the target table using inserts, deletes, before-update image and after-update image.
We also created a tMSSqlCDCCheckpoint component that clean up the CDC data after it has been successfully processed.
Here is a screenshot showing the component, the options and the read-only schema fields that are associated with SQL Server CDC:

Thanks,
Scott