I am trying to setup Real time change data capture between two different MySQL databases using Talend Studio.
I was able to successfully create a job that uses Publish/Subscribe model that picks up only the changed data from source and populates in the target database.
I could not find the documentation to setup CDC in real time i.e. as soon as a new row is inserted in the source database it will be picked up by the job and populated in target database. The Talend job will be running continuously to look for possible changes in the source.
My question: is scheduling the Talend job using some scheduler for desired interval the only option in this case? What are the options available in Talend Studio to achieve this?
Thanks in advance.
If you are looking for a real time solution for this, you may want to use the ESB. Essentially the process would remain the same, but you would have a Talend (Apache Camel) route monitoring your changes. When a change occurs, the route would trigger your job to update the target DB.
Thanks for your reply. I have designed a job for which source is the tMySqlCDC component. This component keeps track of the changes since last execution of the job. So essentialy it is Capturing the change data. What is missing in this piece is that I have to run this job for the changes to be reflected in the target database. How do I modify this job such that it continuously keeps on looking for changes in the source database i.e. once you start the job it keeps running and keeps the source and target database in sync.
Thanks once again.
Data integration jobs are batch; they start and end. What you need to do is use a Talend (Camel) route. A route will remain always on and can monitor a database folder for changes. This will require Talend ESB. You will not be able to do what you require with just a job.
You can make a cron Job in TAC and schedule to run it every 15 mins. In this way you don't have to worry about the job triggering also. once the job is triggered it will pull the change data into your space every 15 mins. This would be a near to real-time CDC. you can also change the cdc job triggering evry 5 mins depending on the average job completion time.
Talend CDC is not real time
You can look for:
You can push data from CDC to Kafka, and than parse Kafka topic with Talend, this is work
All based on native replication protocol and work without overloading of server by triggers
Say I have Talend ESB ready. I would imagine the solution would be there is a c-component connect to a c-TalendJob which consists of the CDC job.
May I know which c-component to be use?
Introduction to Talend Open Studio for Data Integration.
Practical steps to developing your data integration strategy.
Create systems and workflow to manage clean data ingestion and data transformation.