SCD TYPE 2 - Surrogate Key generation - Spark Big Data Job

Four Stars

SCD TYPE 2 - Surrogate Key generation - Spark Big Data Job

1. Create Context which saves MAX (surrogate_key) from Dimension table,

 context_load.PNG

2. Create variable for Surrogate_Key ,

tmap.PNG

Variable (SK) = Numeric.sequence("COVER_CLASS_SK",(Integer)context.COVER_CLASS_SK  + 1,1)  ,

 

3. I understand running on multiple executors (Spark framework) the numeric sequence won't work,

 

4. Can someone please suggest with an example how to go about generating sequence numbers for dimension Surrogate keys?

 

 

 

 

Thirteen Stars

Re: SCD TYPE 2 - Surrogate Key generation - Spark Big Data Job

Hi Chandra,

 

just to keep it here as possible answer. Because it is just a surrogate key UUID could work good in this case.

it could be:

  • Java - UUID.randomUUID().toString()
  • Spark - monotonicallyIncreasingId


in case of primary key we do not need sequenced incremental, just must be unique

 

Vlad

-----------

Cloud Free Trial

Try Talend Cloud free for 30 days.

Tutorial

Introduction to Talend Open Studio for Data Integration.

Definitive Guide to Data Integration

Practical steps to developing your data integration strategy.

Definitive Guide to Data Quality

Create systems and workflow to manage clean data ingestion and data transformation.