Generated AND Persistent Surrogate Keys

One Star

Generated AND Persistent Surrogate Keys

Most tools generate only Generated Surrogate Keys. In other words, a unique, non intelligent key that is generated for each record that is created.
What I have found works very well is to also create a persistent Surrogate Key. In other words, a non unique, non intelligent key that is generated for each series of records that have the same natural key. For example, a record is created, it is assigned a unique generated sk as well as a unique persistent sk. Each time this record is changed, the new resulting records have new generated sk's created as per normal, but each subsequent record reuses the original persistent sk as well.
This might sound crazy to a lot of people, and it might sound like a waste of time, as you are probably thinking that this sounds like exactly what the original Natural Key from the source system is like.
I have recently been involved in several system migration projects where I had the responsibility of applying the change of source systems to the warehouse. In these source system changes, one of the major issues that we had to deal with were that some of the core, most important source system primary key values were been changed in the migration, with new key values being generated in the new systems. This was going to be a major issue for us as it would mean that we would not be able to trace and link the newly created keys to the original records, effectively breaking our history.
Fortunately, the company I worked for had a policy of keeping both persistent AND generated SK's. This allowed us to apply a once off patch to the data coming in to force it to pick up the original persistent sk's. This only had to be done once, as after the first time, the new records all had the old sk's. This meant that even though we no longer had the same natural keys, we were still able to trace our data back over time using our persisten key.
What I realised then is that you cant always rely on your natural key to never change, and when it does change, you have no way of liking your data over time. If you have a persistent key though, with only a very little manual intervention, this is still possible. Giving you much more flexibility in your DW.


Talend named a Leader.

Get your copy


Kickstart your first data integration and ETL projects.

Download now

Best Practices for Using Context Variables with Talend – Part 1

Learn how to do cool things with Context Variables


Migrate Data from one Database to another with one Job using the Dynamic Schema

Find out how to migrate from one database to another using the Dynamic schema


Best Practices for Using Context Variables with Talend – Part 4

Pick up some tips and tricks with Context Variables