I gather from few vidoes and blogs that it is possible to mask data using talend in a simple way using few masking related components.
What my requirement though is I want to be able to view unmasked data depending on user role/permission. Is this really possible in Talend?
I am using Redshift for Db.
Is your whole idea to mask data but keep a way to retrieve back the original data with user role/permission?
@xdshi: Not sure if I understood your question but what I want to be able to do is if normal user logs in then he/she sees masked data whereas admin or someone with higher permission should see unmasked data.
Thanks for your confirmation. We have redirected your mask data issue to talend DQ experts and will come back to you as soon as we can.
Currently, we don't allow to unmask data after they have been masked.
that's something we're thinking about for our future roadmap though.
One possible thing with the actual tDataMasking component is to output the original data along with the masked data.
Then you could build a bidirectional mapping table between the original data and the masked data.
This mapping should be carefully secured but it would allow you to get the masked data from the original data and the reverse.
There may be several ways to do that. One possible way is by using the tMemorizeRow component https://help.talend.com/reader/rflY4_~uVcU8fbet7pa6Qg/AFbitblcOn~5QiQrw5PMSQ
Hope this helps
The masking components (like most of the Data Quality components) are not available in the free Open Studio. They are part of the Enterprise versions available through subscription.
You can have a look at the Talend Components documentation where everything is listed: https://help.talend.com/reader/PEtNf6RuyCZnH5XfH7jFow/jbsq8BiRGozMCA5I9cmH3w
I have a question related to tDataMasking component.I am using tDataMasking to mask the input SSN number field.
I found that in the initial run 999-999-999 was masked to 123-456-789 but when I received the same SSN number on second file, as incremental file, the SSN 999-999-999 was masked to some other value 789-456-123. Is there a way to mask the values in a defined way, instead of random, to maintain data integrity?
yes, the tDataMasking component supports several schemes of masking: See https://help.talend.com/reader/0o9b5oCDP162lzXURYPZbg/QSLEkWqZwGeZVah0erPbzA
Regarding the SSN masking, it supports the bijective masking capability: https://help.talend.com/reader/0o9b5oCDP162lzXURYPZbg/DDvsI0xkSNVivuM9fMZhgA
You need to use the FPE encryption method for that.
Thank you, I will give a try.
I have one more query related to dynamically selection of column to be Masked – I am using tDataMasking component to mask the input columns of a delimited fie. My requirement here is to mask 1000+ files, each with different schema, using Talend job which will identify the column to be masked dynamically for each file. In other words, I don’t want to select the column to be masked from tDataMasking dropdown for each file. Please let me know if we can achieve this using tDataMasking or other Talend components.
I have another question related to tDataMasking.
When “SEED FOR RANDOM GENERATOR” is used in masking, the output column is coming with Junk characters. The expectation is that data should be in a readable format.
To illustrate the issue, I have used the data from talend example and it returned different result.
Input - Ms Isabelle Turner
Output - Ly Çhxjuûâë Wmíøìï
SEED FOR RANDOM GENERATOR - 12345678
How can I get a readable output (i.e. English alphabet characters)?
I have no easy solution for this use case.
In the Studio, the configuration of the component is manual and the developer needs to select how to mask each column.
In Data Preparation, each column is semantically analyzed and for those columns having a semantic type, an automated masking can be done (we called it semantic masking).
But I don't see exactly how we could automate the two steps (semantic discovery then semantic masking) without knowing the schema of the data at first.
The behavior with accented characters has been improved in the 7.2 version of the Studio.
Basically, as explained at https://help.talend.com/reader/0o9b5oCDP162lzXURYPZbg/~5JVmaygo~wT8V7uZB4RoQ
Characters that belong to the selected alphabets are masked with characters from the same character type within the selected alphabet.
When selecting the Best guess alphabet, masked values contain characters from all alphabets represented in the input values. Best guess is the default alphabet.
About supported characters
Talend named a Leader.
Kickstart your first data integration and ETL projects.
Learn how to enable Data Governance
Take a peek at the definitive guide to Government Data Quality