One Star

Loading data from mysql db to vertica db

Hi

Im trying to load data from mysql db which acts as staging db in data warehouse to vertica(data warehouse) db .
The flow is as follows:
tmysqlinput---->main---->tmap---->main---->tverticaoutput
And fetching only new data from mysql table by using left outer join on tables in both db and i have scheduled the job for each one hour.
I want suggestions or logic's on how to do the following:
What if there is no new data available what will be done?
And what if data is coming continuously even while reading?
How we check in ftp if the file is completely transferred we should have such check on data
What if job gets stuck ?
6 REPLIES
Moderator

Re: Loading data from mysql db to vertica db

Hi,
For your requirement, are you looking for TalendHelpCenter:Talend CDC architectural overview
TalendHelpCenter:Change Data Capture CDC is easy with Talend Studio.
Best regards
Sabrina
--
Don't forget to give kudos when a reply is helpful and click Accept the solution when you think you're good with it.
One Star

Re: Loading data from mysql db to vertica db

Hi
Im using TOS, how to achieve CDC in this
Moderator

Re: Loading data from mysql db to vertica db

Hi,
So far, CDC is only available in Talend Enterprise Subscription Version.
For community version, actually you can compare two tables to get updated data.
tmysqlinput(target)---->main---->tmap---->main---->tverticaoutput
tmysqlinput(source)---->loopup---->
Make inner join in tMap and set the "Catch lookup inner join reject " as true. The processed data should be updated data.
Best regards
Sabrina
--
Don't forget to give kudos when a reply is helpful and click Accept the solution when you think you're good with it.
One Star

Re: Loading data from mysql db to vertica db

hi
Okay that is one way and is there any triggering options in TOS if new data arrives in any DB so at that time we can run a job to read those data.
Seventeen Stars

Re: Loading data from mysql db to vertica db

If you get data continuously in a MySQL database then the task is a bit difficult to get only the updates.
At first you have to take care the runtime of your job should be short. This can be done by limiting the data what will be processed within one iteration. If you detect the new data with a timestamp - call it last_modified - avoid a design in which you read from the last timestamp until now. Always limit the timestamp range to a reasonable value - say 1hour.
After finishing the time range proceed with the next time range until the time range not ending with now! This way you can be sure you will not miss any new dataset.
Of course you can implement CDC also in the TOS. CDC for MySQL is based on trigger procedures and you can create them for your self. These triggers must write the IDs of the new or updated datasets into a table and you can use this table for a join to get only whats changed.
One Star

Re: Loading data from mysql db to vertica db

hi
Relating to fetching data from table ,can we make talend job run multiple times on same table to fetch data simultaneously and condition is each job should pick unique data from table and write into file. How can we achieve this in talend tool