Data staging

Six Stars

Data staging

Hello everyone, I would need to create a staging area and then replicate the tables of one or more databases and keep the data inside them aligned with a unique external and independent datatbase. This system will have to manage new entries, updates and deletions from one system to another. Is there any document or some example from which to take inspiration? thank you

Employee

Re: Data staging

Hi,

 

    If my understanding about your query is correct, you are looking for a stage area followed by the final data resting place which needs to be replicated further.

 

    There are multiple ways you can do this use case. For example, you can use Data replication tools to do data movement between the target database and its replicated database. Another way is to send the final data loaded to a message system which will transfer the data to replication area.

 

    Another query I am having is that are you looking for strict replication or eventual replication of databases. Volume is another factor which will decide where to replicate. One more factor I could think is the typr of source and replication databases where you are trying to do this process. For example, if everything is Oracle or MSSQLServer, then I would advise to use the DB replication tools. But if its totally different type of database or the back up is to a Hadoop Datalake, then you will have to take your approach in different way.

 

   So if you could give more details about your use case, we can discuss further on how we can implement the use case.


Warm Regards,
Nikhil Thampi

Please appreciate our Talend community members by giving Kudos for sharing their time for your query. If your query is answered, please mark the topic as resolved :-)

 


Warm Regards,
Nikhil Thampi
Please appreciate our members by giving Kudos for spending their time for your query. If your query is answered, please mark the topic as resolved :-)
Highlighted
Six Stars

Re: Data staging

what I would need is summarized in the diagram below

Six Stars

Re: Data staging

schema.png

Employee

Re: Data staging

Ok. Perfect. You are trying to load the data from source systems to staging and final resting areas in Data warehouse.

 

If you want near realtime data pumping from source systems to DW, you can opt for a Kafka or similar message systems to transmit the data from source systems to target staging area using Talend. You can also go for delta data transfer also in batch mode using Talend tool using Oracle components.

 

Another option is to use the CDC feature of Talend. The most real time way to do it is through Talend and Kafka as you are not using any batch mode processing in this method.


Warm Regards,
Nikhil Thampi

Please appreciate our Talend community members by giving Kudos for sharing their time for your query. If your query is answered, please mark the topic as resolved :-)

 


Warm Regards,
Nikhil Thampi
Please appreciate our members by giving Kudos for spending their time for your query. If your query is answered, please mark the topic as resolved :-)
Six Stars

Re: Data staging

thank you. I do not need to load data in real time, but it is sufficient to load them in batch mode (example every 10/20 minutes). would you have an example from where to start work and understand how I can implement the development?

Employee

Re: Data staging

Hi,

 

    If you are looking for batch mode of processing, you can refer the traditional database components (Oracle) for understanding how the data is extracted and how it is loaded. If the data volume is too big, please use the Bulk components to load the data to target systems.

 

https://help.talend.com/reader/iYcvdknuprDzYycT3WRU8w/3CuxPiH0WZnEDpaNOJEcjA

 

    There are example scenarios for these components in the help document which may help you to understand the component in detail.


Warm Regards,
Nikhil Thampi

Please appreciate our Talend community members by giving Kudos for sharing their time for your query. If your query is answered, please mark the topic as resolved :-)

 


Warm Regards,
Nikhil Thampi
Please appreciate our members by giving Kudos for spending their time for your query. If your query is answered, please mark the topic as resolved :-)
Six Stars

Re: Data staging

thanks but I can not understand how to intercept the updates and deletes of the records :-(

Employee

Re: Data staging

Hi,

 

     You can send the updates and Deletes also to the downstream system using either of the above methodologies. Since you are using the Datawarehousing model, the moment you identify a matched record, you can set the expiry date column (or is deleted flag) of that record to mark that the previous version is set as expired. In case of an update, you can insert the new record with expiry column as empty.

 

     This will mean that the new record which you have inserted (which came as update record from source) is the active record. But if you want to look the history at any point of time, you can always fetch the records with expiry date marked also so that you will understand how the record got changed over a period of time.


Warm Regards,
Nikhil Thampi

Please appreciate our Talend community members by giving Kudos for sharing their time for your query. If your query is answered, please mark the topic as resolved :-)

 


Warm Regards,
Nikhil Thampi
Please appreciate our members by giving Kudos for spending their time for your query. If your query is answered, please mark the topic as resolved :-)

Cloud Free Trial

Try Talend Cloud free for 30 days.

Tutorial

Introduction to Talend Open Studio for Data Integration.

Definitive Guide to Data Integration

Practical steps to developing your data integration strategy.

Definitive Guide to Data Quality

Create systems and workflow to manage clean data ingestion and data transformation.