SCD TYPE 2 - Surrogate Key generation - Spark Big Data Job

Four Stars

SCD TYPE 2 - Surrogate Key generation - Spark Big Data Job

1. Create Context which saves MAX (surrogate_key) from Dimension table,

 context_load.PNG

2. Create variable for Surrogate_Key ,

tmap.PNG

Variable (SK) = Numeric.sequence("COVER_CLASS_SK",(Integer)context.COVER_CLASS_SK  + 1,1)  ,

 

3. I understand running on multiple executors (Spark framework) the numeric sequence won't work,

 

4. Can someone please suggest with an example how to go about generating sequence numbers for dimension Surrogate keys?

 

 

 

 

Highlighted
Forteen Stars

Re: SCD TYPE 2 - Surrogate Key generation - Spark Big Data Job

Hi Chandra,

 

just to keep it here as possible answer. Because it is just a surrogate key UUID could work good in this case.

it could be:

  • Java - UUID.randomUUID().toString()
  • Spark - monotonicallyIncreasingId


in case of primary key we do not need sequenced incremental, just must be unique

 

Vlad

-----------

2019 GARNER MAGIC QUADRANT FOR DATA INTEGRATION TOOL

Talend named a Leader.

Get your copy

OPEN STUDIO FOR DATA INTEGRATION

Kickstart your first data integration and ETL projects.

Download now

What’s New for Talend Summer ’19

Watch the recorded webinar!

Watch Now

Put Massive Amounts of Data to Work

Learn how to make your data more available, reduce costs and cut your build time

Watch Now

How OTTO Utilizes Big Data to Deliver Personalized Experiences

Read about OTTO's experiences with Big Data and Personalized Experiences

Blog

Talend Integration with Databricks

Take a look at this video about Talend Integration with Databricks

Watch Now