Data Stewardship

Highlighted
Four Stars

Data Stewardship

Hi We are using the talend cloud - data stewardship application and need some help on the merge campaign definition.

 

In the dataquaity webinar - n 3 sources where combined and processed for duplicates . The suspect records were pushed to the stewardship application . In the merge campaign tasks - the group had 4 records (3 for the dups from 3 sources) and one suggested survivor.

 

We tried a similar flow in our DI flow. The source was one table and 3 duplicates were identified. These suspects were pushed to the stewardship console . The tasks had 3 records (identified as close match) and did not have a separate golden record. Why would this happen?

 

Do we have to specify something differently in the campaign to see a distinct golden record?


Accepted Solutions
Highlighted
Employee

Re: Data Stewardship

Hello,

 

>Since you had mentioned inject - does it mean when a set of duplicates are loaded into the stewardship console a separate master /golden record need to be inserted if we have to see a separate record (apart from the duplicates) ?

Yes, if the golden record is not injected then the golden record is only displayed as a possible golden record in the state "New", not stored.

 

>If the answer is yes - how do we do it. Is it a parameter or a flag send by the job which loads the merge campaigns into the steward ship console.

When you inject tasks with the studio, the record with the field TDS_MASTER set to true is considered as the golden record for the task, the other are the sources.

 

>I was under the assumption the stewardship app will add a new record when a set of duplicates are loaded - I believe this is not correct. Please confirm.

The golden record is calculated and displayed but not stored. It's only stored as the golden record after the validation of the task.

 

>Follow-up question - If only source records are injected - how does the system identify the golden record is it always the first record in the group ?

The golden record is identified according to the survivorship rules defined in the campaign. When a campaign is designed or updated, for each attribute, a specific survivorship rule can be selected. By default, it's "first valid".

 

Regards

 

View solution in original post


All Replies
Highlighted
Employee

Re: Data Stewardship

Hi,

The suggested survivor is called the golden record, it is only recorded once the task is validated.

There are 2 possibilities :

 - Only the sources are injected, then the golden record is saved when the task is validated.

 - The sources and the possible golden record are injected, then this record is available before the validation, and the changes are saved when the task is validated.

 

Am I answered your question?

 

Regards

Highlighted
Four Stars

Re: Data Stewardship

Hi Nadia,

 

I sincerely appreciate your prompt response and yes It does answer my query. I still do have couple of follow-up questions to better understand the flow:

 

Since you had mentioned inject - does it mean when a set of duplicates are loaded into the stewardship console a separate master /golden record need to be inserted if we have to see a separate record (apart from the duplicates) ?  If the answer is yes - how do we do it. Is it a parameter or a flag send by the job which loads the merge campaigns into the steward ship console.

 

I was under the assumption the stewardship app will add a new record when a set of duplicates are loaded - I believe this is not correct. Please confirm.

 

Follow-up question - If only source records are injected - how does the system identify the golden record is it always the first record in the group ?

 

I sincerely appreciate your help here.

Highlighted
Employee

Re: Data Stewardship

Hello,

 

>Since you had mentioned inject - does it mean when a set of duplicates are loaded into the stewardship console a separate master /golden record need to be inserted if we have to see a separate record (apart from the duplicates) ?

Yes, if the golden record is not injected then the golden record is only displayed as a possible golden record in the state "New", not stored.

 

>If the answer is yes - how do we do it. Is it a parameter or a flag send by the job which loads the merge campaigns into the steward ship console.

When you inject tasks with the studio, the record with the field TDS_MASTER set to true is considered as the golden record for the task, the other are the sources.

 

>I was under the assumption the stewardship app will add a new record when a set of duplicates are loaded - I believe this is not correct. Please confirm.

The golden record is calculated and displayed but not stored. It's only stored as the golden record after the validation of the task.

 

>Follow-up question - If only source records are injected - how does the system identify the golden record is it always the first record in the group ?

The golden record is identified according to the survivorship rules defined in the campaign. When a campaign is designed or updated, for each attribute, a specific survivorship rule can be selected. By default, it's "first valid".

 

Regards

 

View solution in original post

Highlighted
Four Stars

Re: Data Stewardship

Thanks You Nadia. This helps. We will check our jobs and  set the TDS_Master appropriately.

 

 

2019 GARTNER MAGIC QUADRANT FOR DATA INTEGRATION TOOL

Talend named a Leader.

Get your copy

OPEN STUDIO FOR DATA INTEGRATION

Kickstart your first data integration and ETL projects.

Download now

Creating a Stewardship Campaign and Data Model

In this video, you will see how to create a new campaign and enable team members to address curation tasks assigned to them within an approval workflow process

Watch Now

Managing a Stewardship Campaign

In this video, you will see how to assign your best team members specific tasks to reconcile, correct, merge, arbitrate or group pre-determined data and achieve quality, clean data in a limited time

Watch Now

Talend Data Stewardship – What does it have to offer?

In this short series, you will see how Talend Data Stewardship transforms your employees into data citizens and enables them with self-service capabilities to control their quality data

Watch Now