Four Stars

tOracleSCDELT and duplicate rows

Can anyone shed some light on how the Oracle SCD components perform their updates; is it as a set of rows or row by row?

 

We're encountering an issue where if there are multiple rows for the same ID in a source table - where the non-key fields differ - the whole set of rows is inserted as 'current', rather than just the final row. For this data set it's more than likely that it can be updated more than once a day and we'd like to retain that history, however we're currently getting a massive rate of duplication.

 

For example, if we have a dimension table with the following already in place: Dimension StartDimension Start

Then run a tOracleSCDELT component on this data in staging, with UNIQUE_ID as the Source Key, and the rest Type 2 columns:Staging StartStaging StartWe expected the following result:Dimension ExpectedDimension Expected

 

 However the actual result is:

Dimension ActualDimension Actual

 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
Four Stars

Re: tOracleSCDELT and duplicate rows

Hi Sabrina,

I managed to find the root cause - the issue is with how we were passing data over to the component and the expected behaviour of it, rather than the component itself.

 

As we were passing in multiple change rows to the component where it expected a single row to be present, it was inserting the whole block of change rows instead of processing them one by one. We've now modified the source system to only provide the single most recent row when there's an update.

 

For anyone stumbling across the same topic - the SCD components pull from your source table by selecting the following:

SELECT * FROM sourceTable 
WHERE sourceTable.ID = targetTable.ID
AND ((sourceTable.col1 <> targetTable.col1) OR (sourceTable.col2 <> targetTable.col2) OR ... )

This means if there's more than one change row for a given ID in your source table, the whole set will be selected and inserted.

 

Version is Talend 6.2 Big Data.

2 REPLIES
Moderator

Re: tOracleSCDELT and duplicate rows

Hello,

Could you please indicate on which build version you got this issue?

Best regards

Sabrina

--
Don't forget to give kudos when a reply is helpful and click Accept the solution when you think you're good with it.
Four Stars

Re: tOracleSCDELT and duplicate rows

Hi Sabrina,

I managed to find the root cause - the issue is with how we were passing data over to the component and the expected behaviour of it, rather than the component itself.

 

As we were passing in multiple change rows to the component where it expected a single row to be present, it was inserting the whole block of change rows instead of processing them one by one. We've now modified the source system to only provide the single most recent row when there's an update.

 

For anyone stumbling across the same topic - the SCD components pull from your source table by selecting the following:

SELECT * FROM sourceTable 
WHERE sourceTable.ID = targetTable.ID
AND ((sourceTable.col1 <> targetTable.col1) OR (sourceTable.col2 <> targetTable.col2) OR ... )

This means if there's more than one change row for a given ID in your source table, the whole set will be selected and inserted.

 

Version is Talend 6.2 Big Data.