I am fairly new to Talend. My use case is,I have two excel tables with employee data. The columns are name, email, street and phone number. I need to find out the common employees between both the tables based on phone number or street and put the data into a third excel sheet. I can do the above using a tuniqRow and Tunite. However, the phone number could be of the format , +1 8x9-201-1xx5 in one table and in the second table, it could be 8x9-201-1xx5. the street field could be Main street on one table and Main st in another. How can I deal with that? Should I use a tmap, tregex? and how should I filter out the data? Thank you very much!
You should have some search around tFuzzyMatch component which is here to help for deduplication using Levenshtein, Metaphone or Double Metaphone algorythm.
Probably it could help you to solve this kind of use case.
Let us know.
Talend named a Leader.
Kickstart your first data integration and ETL projects.
Watch the recorded webinar!
Learn how to enable Data Governance
Take a peek at the definitive guide to Government Data Quality