One Star

Surrogate Key in tOracleSCD

Hi all
I have setup a job to load a dimension table, using tOracleInput and tOracleSCD.
I cannot, however, find any reference anywhere to setting up the Surrogate Key for this dimension.
In the tutorials, webinars and documentation, we are shown a screen that is used to set up the standard SCD process. This has numerous tabs, for all things related to standard SCD logic. This looks completely different to what I see though in Perl TOS3.1.....The functionality also looks different, as I cannot select and map my record open and closed dates from input fields.
How do I, in Perl TOS 3.1:
a) set my Surrogate Key column
b) map my open and closed dates from source fields
Tags (1)
5 REPLIES
One Star

Re: Surrogate Key in tOracleSCD

I have done some more reading on this, and have found reference to something called the "SCD Editor", which is the view that I have seen in all the webinars, tutorials, etc. The problem though, is that I don't see a way to invoke, or open this in TOS 3.1? I don't see this option anywhere in the tOracleSCD component.....
One Star

Re: Surrogate Key in tOracleSCD

Here is an example of what I mean:
1. First 2 slides show the SCD Editor referred to in the documentation.
2. Last slide shows what I see in TOS 3.1
One Star

Re: Surrogate Key in tOracleSCD

what I see
Employee

Re: Surrogate Key in tOracleSCD

Perl tOracleSCD was not "connected" to the new GUI, we need to set
EXTENSION="org.talend.designer.scd"

in tOracleSCD_perl.xml
(and move the "Oracle to_date date format" to advanced parameters)
Can you please open a bug entry in the bugtracker related to this problem?
Now concerning available features. Perl tOracleSCD does not have surrogate key management and start/end validity dates manual management. If you need them, create feature requests please.
The current behavior is the following:
* you can't set the surrogate key (which is fine with MySQL because you let MySQL handle the SK creation with auto_increment), which is not so nice with Oracle, because you can't ask to use a specific sequence.
* the end date is set to null for the active record
* the start date is set to $_globals{job_execution_datetime} (and you can use a trick to overload it, this is what I do in tuj tOracleSCD_basics)
Create feature requests to ask which way to manage the SK (in priority). Do you simply want to call a sequence? do you want to use an input field? tell us.
By the way, you have added notes in 5485, take care that it concerns tOracleSCDELT (which is not the same component, not the same algorithm) and for Java (keep in mind that Perl and Java shares many features at GUI level, but generated code is sometimes completely different)
One Star

Re: Surrogate Key in tOracleSCD

hi
Thanks for the assistance. Have logged bug and will log feature requests now as asked.
I have a few concerns with what you said though, which I hope you will think very carefully on:
1. Any generic SCD process worth its salt should allow you to define a surrogate key. The fact that you do not for tOracleSCD is concerning, and means that most of your users on Oracle probably can't use this component, because they can't get their SK built properly... which sounds crazy to me - I honestly cant imagine why you wouldn't supply this functionality from the word go, it seems like a serious gap in basic, required functionality!
2. Very concerned that you default the closed dates to null! This should be defaulted to a high value - nulls should never be used in a data warehouse, as they are not indexable in oracle, and do not really provide any clear value or meaning. At least a high value tells you that it is a proper date, and you will know it was created by a proper process, and the values will appear in your index.
3. My biggest concern is that you are defaulting your start dates to the system date and time. This is a major issue from a data warehousing point of view, and I am actually quite surprised that you are doing this! Using system dates as the start date causes data and analysis problems in the data warehouse, because data is created one day in the source system, but because of load times, batch windows, load frequency's etc, might only be loaded 1 or more days later into the warehouse. This means that according to the date that you are using, a record might only appear to be active in the warehouse days, or even months later. This will lead to users not being able to find their data, data will be grouped in incorrect financial periods, the list of problems are endless.
Take this as an example:
You have a daily load, but because of your batch window, your load only starts at 22:00. Data captured on Monday during the day will only get loaded that night. Due to run times, etc, your loads might often extend beyond 23:59. The jobs that run past this point will therefore have a create date of the following day, making them fall into a different effective period. A user then looking for a particular record for that day, will not find what they are looking for.
What happens if this is on the end of the month, or quarter or year? Data could then fall into the next reporting month, etc which is a much more serious issue. What happens when the loads are delayed for a couple of days due to system outages, etc? Your data could come in days later.
I would have thought that any modern tool would as a minimum, default requirement, almost dictate (at least allow) for you to map a valid business effective date into this field.
Its wierd and worrying that you don't allow for that?
I will log this also as a feature request, all though, to be totally honest, I feel that this is actually a MUST HAVE.
Any way, I hope you don't mind me throwing my opinion on youSmiley Happy I do appreciate the assistance!