Masking data in Talend?

Highlighted
Nine Stars

Masking data in Talend?

Hi all

 

I gather from few vidoes and blogs that it is possible to mask data using talend in a simple way using few masking related components.

 

What my requirement though is I want to be able to view unmasked data depending on user role/permission. Is this really possible in Talend?

 

I am using Redshift for Db.

 

Thanks

Harshal.

Highlighted
Moderator

Re: Masking data in Talend?

Hello,

Is your whole idea to mask data but keep a way to retrieve back the original data with user role/permission?

Best regards

Sabrina

--
Don't forget to give kudos when a reply is helpful and click Accept the solution when you think you're good with it.
Highlighted
Nine Stars

Re: Masking data in Talend?

@xdshi: Not sure if I understood your question but what I want to be able to do is if normal user logs in then he/she sees masked data whereas admin or someone with higher permission should see unmasked data.

Highlighted
Moderator

Re: Masking data in Talend?

Hello,

Thanks for your confirmation. We have redirected your mask data issue to talend DQ experts and will come back to you as soon as we can.

Best regards

Sabrina

--
Don't forget to give kudos when a reply is helpful and click Accept the solution when you think you're good with it.
Highlighted
Nine Stars

Re: Masking data in Talend?

That's awesome. Any timeline we are looking to hear back from them?

Highlighted
Employee

Re: Masking data in Talend?

Hi Parikhharshal

Currently, we don't allow to unmask data after they have been masked.

that's something we're thinking about for our future roadmap though.

 

One possible thing with the actual tDataMasking component is to output the original data along with the masked data.

Then you could build a bidirectional mapping table between the original data and the masked data.

This mapping should be carefully secured but it would allow you to get the masked data from the original data and the reverse.

There may be several ways to do that. One possible way is by using the tMemorizeRow component https://help.talend.com/reader/rflY4_~uVcU8fbet7pa6Qg/AFbitblcOn~5QiQrw5PMSQ

 

Hope this helps

Highlighted
Nine Stars

Re: Masking data in Talend?

@scorreia: Thanks a lot for your reply.

Can you pls shed some lights on creating mapping table between original data and masked data?

How then one user will be able access to masked data whereas the user with higher rights accessing unmasked data?
Highlighted
Six Stars
Six Stars

Re: Masking data in Talend?

I cant seem to find this component (tDataMasking) in Big Data open studio, is it available there or in MDM?

Highlighted
Employee

Re: Masking data in Talend?

Hi,

 

The masking components (like most of the Data Quality components) are not available in the free Open Studio. They are part of the Enterprise versions available through subscription. 

You can have a look at the Talend Components documentation where everything is listed: https://help.talend.com/reader/PEtNf6RuyCZnH5XfH7jFow/jbsq8BiRGozMCA5I9cmH3w

 

Cheers,

Patrick

Highlighted
Nine Stars

Re: Masking data in Talend?

@ppeinoit : Thanks for the reply.

Highlighted
Five Stars

Re: Masking data in Talend?

Hi,

 

I have a question related to tDataMasking component.I am using tDataMasking to mask the input SSN number field.

 

I found that in the initial run 999-999-999 was masked to 123-456-789 but when I received the same SSN number on second file, as incremental file, the SSN 999-999-999 was masked to  some other value 789-456-123. Is there a way to mask the values in a defined way, instead of random, to maintain data integrity?

Highlighted
Employee

Re: Masking data in Talend?

Hi Naveen,

 

yes, the tDataMasking component supports several schemes of masking: See https://help.talend.com/reader/0o9b5oCDP162lzXURYPZbg/QSLEkWqZwGeZVah0erPbzA

Regarding the SSN masking, it supports the bijective masking capability: https://help.talend.com/reader/0o9b5oCDP162lzXURYPZbg/DDvsI0xkSNVivuM9fMZhgA

You need to use the FPE encryption method for that.

 

Best regards

Highlighted
Five Stars

Re: Masking data in Talend?

Thank you, I will give a try.

 

I have one more query related to dynamically selection of column to be Masked – I am using  tDataMasking component to mask the input columns of a delimited fie. My requirement here is to mask 1000+ files, each with different schema, using Talend job which will identify the column to be masked dynamically for each file. In other words, I don’t want to select the column to be masked  from tDataMasking dropdown for each file. Please let me know if we can achieve this  using tDataMasking or other Talend components.

Highlighted
Five Stars

Re: Masking data in Talend?

Hi,

 

I have another question related to tDataMasking.

 

When “SEED FOR RANDOM GENERATOR” is used in masking, the output column is coming with Junk characters. The expectation is that data should be in a readable format.

To illustrate the issue, I have used the data from talend example and it returned different result.

Input - Ms Isabelle Turner
Output - Ly Çhxjuûâë Wmíøìï
SEED FOR RANDOM GENERATOR - 12345678

 

How can I get a readable output (i.e. English alphabet characters)?

Highlighted
Employee

Re: Masking data in Talend?

Hi,

I have no easy solution for this use case.

In the Studio, the configuration of the component is manual and the developer needs to select how to mask each column.

In Data Preparation, each column is semantically analyzed and for those columns having a semantic type, an automated masking can be done (we called it semantic masking).

But I don't see exactly how we could automate the two steps (semantic discovery then semantic masking) without knowing the schema of the data at first.

 

Highlighted
Employee

Re: Masking data in Talend?

The behavior with accented characters has been improved in the 7.2 version of the Studio.

 

 

Basically, as explained at https://help.talend.com/reader/0o9b5oCDP162lzXURYPZbg/~5JVmaygo~wT8V7uZB4RoQ

 

Characters that belong to the selected alphabets are masked with characters from the same character type within the selected alphabet.

When selecting the Best guess alphabet, masked values contain characters from all alphabets represented in the input values. Best guess is the default alphabet.

 

About supported characters

https://help.talend.com/reader/IetzD0OTgeEjWYQD77eKPw/HebSGq_ek_lNpZJFY6ZfQA

 

Highlighted
Five Stars

Re: Masking data in Talend?

Thank you so much! I will give a try and get back.

2019 GARTNER MAGIC QUADRANT FOR DATA INTEGRATION TOOL

Talend named a Leader.

Get your copy

OPEN STUDIO FOR DATA INTEGRATION

Kickstart your first data integration and ETL projects.

Download now

Enabling Data Governance

Learn how to enable Data Governance

Watch Now

The Definitive Guide to Government Data Quality

Take a peek at the definitive guide to Government Data Quality

Read