Im trying to load data from mysql db which acts as staging db in data warehouse to vertica(data warehouse) db . The flow is as follows: tmysqlinput---->main---->tmap---->main---->tverticaoutput And fetching only new data from mysql table by using left outer join on tables in both db and i have scheduled the job for each one hour. I want suggestions or logic's on how to do the following: What if there is no new data available what will be done? And what if data is coming continuously even while reading? How we check in ftp if the file is completely transferred we should have such check on data What if job gets stuck ?
Hi, So far, CDC is only available in Talend Enterprise Subscription Version. For community version, actually you can compare two tables to get updated data. tmysqlinput(target)---->main---->tmap---->main---->tverticaoutput tmysqlinput(source)---->loopup----> Make inner join in tMap and set the "Catch lookup inner join reject " as true. The processed data should be updated data. Best regards Sabrina
-- Don't forget to give kudos when a reply is helpful and click Accept the solution when you think you're good with it.
If you get data continuously in a MySQL database then the task is a bit difficult to get only the updates. At first you have to take care the runtime of your job should be short. This can be done by limiting the data what will be processed within one iteration. If you detect the new data with a timestamp - call it last_modified - avoid a design in which you read from the last timestamp until now. Always limit the timestamp range to a reasonable value - say 1hour. After finishing the time range proceed with the next time range until the time range not ending with now! This way you can be sure you will not miss any new dataset. Of course you can implement CDC also in the TOS. CDC for MySQL is based on trigger procedures and you can create them for your self. These triggers must write the IDs of the new or updated datasets into a table and you can use this table for a join to get only whats changed.
hi Relating to fetching data from table ,can we make talend job run multiple times on same table to fetch data simultaneously and condition is each job should pick unique data from table and write into file. How can we achieve this in talend tool