I have an Oracle database where I will monitor a table with changes (CDC).
I want to solve 2 problems in my design:
I want to make a cross between a CDC table and a table with a lot of data, they are not the same table. I need speed and low memory consumption.
Not all CDC table changes can be processed in the current run, so I want to prevent data loss during extraction.
Solution Problem 1)
I have to cross this table with another table but the lookup table has more than 11 million records.
I have thought several options to achieve the goal:
Option a) Extract the big table with TOracleInput and TOracleCDC for the table with changes and use a TMap to make the inner join.
Problem: High memory consumption when extracting data.
Option b) Perform the inner join as an SQL query between the table with changes and the large data table and I only have to extract the joined results.
Due to synchronization problems I may have some changes that can not be processed in that run because the process that updates me the data that allow inner join are not available during that time.
Then I would have to save the results to delete the changes I could process with
tHashInput (Changes processed) -> tOracleOuput (DELETE Changes in table with CDC)
Is it possible to delete changes to a CDC table using tOracleOuput?
I need to keep the changes in the CDC table because I don't want all the changes from the last run to be deleted.
change1 from CDC-> the change can be processed
change2 from CDC-> cannot yet process change
change3 from CDC -> change cannot be processed yet
So if you just process change1, then I want CDC to be such that way:
change2 of CDC
In the next run of the job I would extract them again along with the new changes.
I don't know if anyone can help me design Talend to achieve my goal.
In this community post, https://community.talend.com/t5/Design-and-Development-Internal/Does-Oracle-CDC-Xstream-mode-support... it suggest to keep "Table with CDC" value blank to get all the changes with the object name, and the filter there. Do you think that is a possible approach here?
Talend named a Leader.
Kickstart your first data integration and ETL projects.
Part 2 of a series on Context Variables
Learn how to do cool things with Context Variables
Find out how to migrate from one database to another using the Dynamic schema