One Star

tOracleSCD Component performance issues

Hi All,
I am using tOracleSCD component to create and manage a type 2 dimensions. I am doing a historical load for 14 years. As I am loading data into the SCD2 dimension, the performance of the job is deteriorating considerably. It is taking more and more time to load the dimension.
My SCD dimension contains data from 3 sources and I have three jobs to load data in the dimension, one for each source. I have a column called Source System Code in the SCD 2 dimension to identify the records from the 3 source systems.
I have two questions
1) Is there any way I can use the source system code column in the tOracleSCD object as a filter so that the SCD component is only working on records from one source system at a time?
2) How can I improve the performance of the tOracleSCD component so that with more data in the SCD2 dimension table, the performance of the job is not considerably impacted?
Vivek
2 REPLIES
Seventeen Stars

Re: tOracleSCD Component performance issues

To 1: Yes it is possible. You should use your source system code as part of the source key. In this way, the SCD component does not compare these datasets with other source system codes.
To 2: It depends in which way you use the SCD component. If you use this component in the memory saving mode (which is not the default) you have to take care, to use indexes on the type-2 columns. If you have a lot of columns it is a good idea to use a checksum field to detect changes instead of comparing every field
If you do not use the memory saving mode, the SCD component reads all related fields to compare in the memory at first. This is the fastest way but only if the amount of data sets is not so much big.
You could also think about a staging table and doing the SCD methods with the ELT component. Usually this is the fastest way.
One Star

Re: tOracleSCD Component performance issues

Thanks jlolling for the answer! I really appreciate it.
I have a few followup questions.
1) I do have a lot of type 2 columns in my type 2 dimension and I don't think that I can create index on so many columns. However I wanted to know how can I add and use the checksum field to compare changes and still be able to put the type 2 columns in the SCD2 Editor. If I put the type 2 columns in the SCD2 editor in the type 2 set of fields, won't Talend still try to compare all the fields with the table.
2) I could not find any documentation on the memory saving mode. It seems that you know a lot about that. Do you know where can I find more information about the memory saving mode and how SCD2 object works internally?
3) You mentioned that I can use the ELT component to implement SCD2. Can you please guide me a little more on that?