Five Stars

Talend CDC with Oracle

Hello everyone,

I want to connect to Oracle Database and monitor the changes on a source table and need to capture the changes every 5 mins and put the updates and inserts in a separate mysql tables. I also want to do a few data quality checks in between too. What is the best CDC approach for this? Trigger Based or Log based? And can someone show me an sample example of the dataflow.

 

Thanks in advance.

1 ACCEPTED SOLUTION

Accepted Solutions
Employee

Re: Talend CDC with Oracle

Hi Pras1258,

 

     Since it is a trigger, the data will be moving to the interim table as soon as it is loaded into source database. You will not be able to control that aspect. Also if we are trying to fetch the change data from source every 10 minutes, we are actually diluting the core essence of CDC.

     For example, any change to a record which might have happened in between those 10 minutes will not be recorded in CDC table. 

     If you want to poll the source tables, every 10 minutes, a good idea will be to use tDBInput and add a where clause to pick only the interval data.

     Coming to your second query, if the data is not extracted from interim table by tOracleCDC component, it will remain there in the interim table. Another scenario is that you have extracted from stage table using that component but your job failed before loading to target. In that case, better to extract and keep to a temp file as first stage so that you can refer that file to restart the job rather than extracting all the way from source (by default, CDC interim table will be automatically cleared once you extract data using tOracleCDC). You can go to advanced option and select the tick box for Keep data in CDC table. But in that has, the onus of deleting data will be on you.

 

Warm Regards,

 

Nikhil Thampi

Tags (2)
8 REPLIES
Employee

Re: Talend CDC with Oracle

Hi, You can use trigger based approach if you want to maintain the CDC irrespective of the Oracle version. But if the Oracle DB version is 11 or less, you can go for Redo log based approach. There is another approach called XStream if your DB is Oracle 12 with ODI. The trigger based approach need less DBA intervention but there will be an overhead in DB as we will be indirectly generating DB triggers. The redo log approach will be less obstructive to DB as it is reading change records from redo log files. But the amount of DBA intervention and access privileges will be more compared to Trigger based approach. Talk to your DBAs and get their views also and depending on the flexibility of your organization, you can choose one among the options.
Employee

Re: Talend CDC with Oracle

Hi, I would also suggest to go through the below link for the details related to CDC https://help.talend.com/reader/St~M252yz9qZ1qf9VbrV4A/WmevQ3utmOEKzEXAUKCvSA
Tags (1)
Five Stars

Re: Talend CDC with Oracle

Thanks for the reply Nikhil.

 

My company maintains Oracle 10g and 11g databases. The tables I am watching contains around half a million rows of data and I want to capture the changes once every 15 minutes only. My DBA wants us to make sure there is no impact on the performance of the database. At this point, what would be your suggestion? Log based or Trigger based?

 

Do we use some sort of a scheduler to bring the Inserts,updates once very 15 mins or does have any functionality for that? 

Employee

Re: Talend CDC with Oracle

Hi, I would suggest you to start with trigger based approach and see the performance results in one of your lower environments. If the data change frequency is less, you can easily manage the flow without much DBA privileges like in redo log flow. Another advantage of Trigger based approach is that it is compatible even if you are moving to version of Oracle DB at later point of time. If there are any performance hiccups, try redo log CDC for the same data set and compare the results. This will give you a good bench mark. The frequency of data extraction from CDC table can be assigned in TAC and the component to use for data extraction is tOracleCDC. You can refer the details from below link. https://help.talend.com/reader/NNO~fmVQU4rlkF9Depfdxw/JXvgNTorRK1zf3XyoitGag If you think my answer is an acceptable solution, please don't forget to mark it :-)
Tags (1)
Five Stars

Re: Talend CDC with Oracle

Ok. Let me reconfirm this. So, the CDC monitors the changes to the source table and published in a change table. So the change table wants to get the changes once every 15 mins. Can you provide me a sample ETL job to help me on this? That would be awesome.

 

 

Employee

Re: Talend CDC with Oracle

Hi,

 

        Unfortunately I do not have a Talend code handy at this point of time. But I have done Trigger based CDC sometime back for a customer and it worked fine. The steps mentioned in the Talend document is sufficient to create sample jobs.

 

        Again I would like to give a word of caution that ultimately the amount of change data in underlying database will determine the performance of CDC. So I would strongly recommend to do performance tests before going to production.

 

Warm Regards,

Nikhil Thampi

 

Five Stars

Re: Talend CDC with Oracle

Thanks Nikhil.

Can I slow down the CDC Triggers to fetch the data from source only every 10 mins and then load that modified data into a Staging Database and then I want to push that data into a target data warehouse?

What would you do if data didn't successfully load from stage to target but it made from source to staging? What would you do to only move from stage to target when such situation occurs.

 

 

Employee

Re: Talend CDC with Oracle

Hi Pras1258,

 

     Since it is a trigger, the data will be moving to the interim table as soon as it is loaded into source database. You will not be able to control that aspect. Also if we are trying to fetch the change data from source every 10 minutes, we are actually diluting the core essence of CDC.

     For example, any change to a record which might have happened in between those 10 minutes will not be recorded in CDC table. 

     If you want to poll the source tables, every 10 minutes, a good idea will be to use tDBInput and add a where clause to pick only the interval data.

     Coming to your second query, if the data is not extracted from interim table by tOracleCDC component, it will remain there in the interim table. Another scenario is that you have extracted from stage table using that component but your job failed before loading to target. In that case, better to extract and keep to a temp file as first stage so that you can refer that file to restart the job rather than extracting all the way from source (by default, CDC interim table will be automatically cleared once you extract data using tOracleCDC). You can go to advanced option and select the tick box for Keep data in CDC table. But in that has, the onus of deleting data will be on you.

 

Warm Regards,

 

Nikhil Thampi

Tags (2)