Which to choose - SCD component or manual coding of SCD?

Five Stars

Which to choose - SCD component or manual coding of SCD?

Hi, I am loading 25 tables, from mssql to postgres. Each table approximately contains 1 million rows. Now, I load it first using regular components.

 

Once, the full load is done, I want to run the delta load, meaning I want to use cdc on sql server and use the changed data to either update or insert data in the tables of postgres

 

I can do this using SCD components of talend or make my own SCD flow.

 

I prefer using the SCD component as it takes care of a lot manual update/insert code I have to write otherwise

 

I see the performance with SCD is not as good as the manual update/insert code

 

What can I do to enhance the performance of SCD? I have read in a few places, that SCD is slow, if yes, why would this component be exposed when SCD coding could be done manually?

 

I would prefer to go with SCD as maintaining type1,type2, type3 and hybrid type is managed by the SCD component. This would also prevent a lot of other bug prone code in the manual update/delete route

 

Please suggest as to how I can go about this?

 

Thanks

Forteen Stars

Re: Which to choose - SCD component or manual coding of SCD?

Hi,

 

all depends from number of daily changes

 

yes, SCD is slow, but for example:

 

You have 500 000 customers, for an average company it could be up to 2000 customers changed per day

and this with 5 SCD operations per seconds give us 400sec, 15min - not dramatical

 

but if You try to use the same technics for 100k daily updates - it is another case

-----------
Five Stars

Re: Which to choose - SCD component or manual coding of SCD?

Hi vapukov,

 

Thanks for the reply. Let's say I decide to run my job every 15 minutes and the change in data is in terms of 10-15 rows every run. 

 

And rarely , I might get a bulk change say around 5000 rows. 

 

Will using the SCD under such circumstances, be still okay?

 

Thanks
Rathi

Five Stars

Re: Which to choose - SCD component or manual coding of SCD?

Hi vapukov,

 

Any suggestions please?

 

Thanks

Employee

Re: Which to choose - SCD component or manual coding of SCD?

Hi Rathi,

 

      There is no standard benchmarkign number for the scenario you have explained. There are multiple factors which will decide the performance in this case. Some of the basic factors are CPU utilization, memory utilization, network bandwidth. Also we need to keep in mind that you need to factor the system resources for both Talend and the database you are using for the SCD. 

 

      A performance bottleneck in one single area can create problems in your plan. 

 

       Please refer the Talend reference architecture to get some high level system resources to be allocated while installing Talend.

 

https://talendpnp.github.io/architecture-center/reference-architecture/summer-2018-7.0/index

 

         But you need to talk to your DBAs and see where you can tune at DB level also. Even after all these precautions, sometimes there will be bottle necks. In those cases, you will have to see how you can parallelize your tasks, the areas where you can fine tune the flow etc.

 

         It is a continuous exercise but create some PoCs and verify in which stage you are facing the bottle necks and then you can plan the tuning exercise from that stage.

 

Warm Regards,

 

Nikhil Thampi

2019 GARTNER MAGIC QUADRANT FOR DATA INTEGRATION TOOL

Talend named a Leader.

Get your copy

OPEN STUDIO FOR DATA INTEGRATION

Kickstart your first data integration and ETL projects.

Download now

Best Practices for Using Context Variables with Talend – Part 1

Learn how to do cool things with Context Variables

Blog

Migrate Data from one Database to another with one Job using the Dynamic Schema

Find out how to migrate from one database to another using the Dynamic schema

Blog

Best Practices for Using Context Variables with Talend – Part 4

Pick up some tips and tricks with Context Variables

Blog