One Star

Data masking and anonymization using Talend

I have a requirement to migrate data from a source database to a destination database. Also, we want to make sure some of the confidential data like passwords, phone numbers etc., should not be exposed in the destination database. We are planning to use talend data integration tool for this requirement, but I couldn't find out any information related to data masking in user manual.
Could someone please let me know if we can perform data masking/anonymizing using talend?
Thanks in advance for any help you are able to provide.
-Satish
15 REPLIES
Community Manager

Re: Data masking and anonymization using Talend

Hi Satish
No, you can't use Talend Studio to perform data masking, it is an ETL tool, not a data masking tool.
Shong
----------------------------------------------------------
Talend | Data Agility for Modern Business
One Star

Re: Data masking and anonymization using Talend

As highlighted by Shong, there is no dedicated masking functionality. However, there are ways of achieving a similar result using what is available within Talend. For example replacing values with a random or. Lookup/replace - to maintain consistency - using the map component.
However, do also recognize that it would be easier if the transformation is done pre or post migration. Perhaps even using a dedicated tool where the data modeling can be done more elaborately.
One Star

Re: Data masking and anonymization using Talend

Thanks Shong and Kootstra for your valuable suggestions.
One Star

Re: Data masking and anonymization using Talend

Talend Open Studio is one of the best tools I have used in recent years but for the purpose of apply data masking I have been using Data Masker.
I do however agree with Kootstra_a that replacing values with a random or. Lookup/replace - to maintain consistency - using the map component is an inexpensive approach it wont always stand up in you test use cases.
Check out http://www.datakitchen.com.au/ for info about data masking, data masking methodology, data discovery and test data management.

Re: Data masking and anonymization using Talend

the key suggestion by the 'data kitchen' (i loved their photo - very over the top)
anyhow, they discuss 'Developing a Data Masking Methodology'
which can be achieve independently of the tool at hand.
so define your requirements, research and then use Talend to deliver it.
i have recently implemented similar workstream for a client and it works fine with Talend.
regards,
One Star

Re: Data masking and anonymization using Talend

Hii,
Can somebody explain how to have my job run with different schema dynamically?
Means my job have some tfileinput written to tlogrow and I am passing the file name to the tfileinput delimited thru the context variable. I want to display different file contents every time I run this job by passing different file name with different schema to the context variable.
How can I achieve that just by changing the file name provided in the context variable?
P.S:The different file names have different schemas.
One Star

Re: Data masking and anonymization using Talend

Data Masking is a very complex topic actually. It has a lot of IP around it, thus the more-less proofed solutions are pretty expensive. There are simple ways to obfuscate data in ETL, but for some of you guys - would you be interested to look at the components suite that you can call from Talend via simple API?
One Star

Re: Data masking and anonymization using Talend

Hi,
I wonder if somebody did a tutorial on data masking using Talend. A lot of people is talking in general terms, but a detailed tutorial would be most appreciated.
Thanks in advance
Moderator

Re: Data masking and anonymization using Talend

Hi spr655,

Here is a KB article about: TalendHelpCenter:How to setup encryption of the passwords in Talend Studio?
In addition, you can use tContextLoad or implicit tContextLoad to hide the password characters assigned to different context variables.
Best regards
Sabrina
--
Don't forget to give kudos when a reply is helpful and click Accept the solution when you think you're good with it.
One Star

Re: Data masking and anonymization using Talend

Dear Sabrina,
Thanks for the input. It is very interesting for future purposes. At the present time, my question is about how to replace values from input tables/files randomly with values of fake data tables. I have been checking possibilities with Talend, but I don't have that much experience with the product, so I didn't find a concrete solution.
The scenario is the following:
- in input I have 2 different tables and one xml file that contain the fields id, first name, surname and date of birth.
-I have a table that is populated with invented ids, first names, surnames and dates of birth
With Talend, for the same id (2 tables and the xml), I obtain the same fake values in the output (2 tables and the xml) with the help of components or routines.
I have been looking at possible solutions and I guess the java routines will play an important role in this kind of job, because I didn't or couldn't find a specific component/s that would achieve my purpose.
Could you help me with this?
Thanks in advance
Moderator

Re: Data masking and anonymization using Talend

Hi,
The scenario is the following:
- in input I have 2 different tables and one xml file that contain the fields id, first name, surname and date of birth.
-I have a table that is populated with invented ids, first names, surnames and dates of birth
With Talend, for the same id (2 tables and the xml), I obtain the same fake values in the output (2 tables and the xml) with the help of components or routines.

Could you please elaborate your case with an example with input and expected output values?


Best regards
Sabrina
--
Don't forget to give kudos when a reply is helpful and click Accept the solution when you think you're good with it.
One Star

Re: Data masking and anonymization using Talend

Hi,
My post was treated as spam, so I had to upload it as images.
Thanks in advance for your help and support.
Moderator

Re: Data masking and anonymization using Talend

Hi,
Would you mind taking a look at a use case about Implicit tContextLoad in this forum:https://www.talendforge.org/forum/viewtopic.php?id=35478 to see if it is satisfying your needs?
Best regards
Sabrina
--
Don't forget to give kudos when a reply is helpful and click Accept the solution when you think you're good with it.
One Star

Re: Data masking and anonymization using Talend

Hello Everyone,
I saw a lot of suggetions, but does anyone has put in practice some example of Data Masking with Talend? I am trying to read and understand how Data Masking works to implement with Talend.
I really appreciate any help
Regards

Re: Data masking and anonymization using Talend

You cannot expect Talend to identify email Addresses and phone nos. automatically to mask them. But you can mask specific keywords using tReplaceList.
Let's say you have comments column in your file wherein you want users to hide specific keywords likes Account, Phone, Email or Password. It can be done.