TFuzzyMatch using Levenshtein Method

Hi,
I wanted to understand the matching logic in scenario of multiple key attributes using Levenshtein Method with min and max distance as 0 and 5 respectively. What I want to know is : the records are categorized as duplicate on meeting even a single criteria or all the criteria.
7 REPLIES
Four Stars

Re: TFuzzyMatch using Levenshtein Method

sorry reply to wrong thread
Employee

Re: TFuzzyMatch using Levenshtein Method

Mr.M,
If you build a compound key of multiple columns then all of them are taken into account for the match, not just individually.
I would also like to solicit more of an understanding of your data, use case and ultimate goal as to better serve your question. There are several matching components. Which one are you using as a screencap of the job with the component settings would be very useful for our progress?

Re: TFuzzyMatch using Levenshtein Method

Hi,
We are trying to identify duplicated customers based on First Name, Last Name, Phone Number, Email, Address, Zip Code. On Phone Number and ZIP I have applied exact match and on others Levenshtein method.

Re: TFuzzyMatch using Levenshtein Method

Also, I want to understand how does the tFuzzyMatch logic treat the missing values.

Re: TFuzzyMatch using Levenshtein Method

Hi,
In continuation, I also want to understand if Talend fuzzymatch supports the below feature or not.
Let us say, I want to perform match on Name, Address, Email, Phone Number:-
1. What if, for some records the fields are empty. I mean the fill rate is less than 100%. In such scenario, how does Talend handles matching.
2. Can we specify multiple rules in one go like on (Name, Address, Email, Phone Number) or (Name, Email, Phone Number) or (Name, Email) or (Name, Phone Number). In the sense, if any of these 4 rules satisfy, talend should return the records as duplicate records.
One Star

Re: TFuzzyMatch using Levenshtein Method

Hi,
I am using talend open studio version 6.1 .Is it possible to perform in-line matching using tfuzzy match component.I want to match on more than one column like on firstname,lastname,address,zip and phone number.Also is it possible to get different outputs for duplicate and unique values using this component.
Moderator

Re: TFuzzyMatch using Levenshtein Method

Hi,
I am using talend open studio version 6.1 .Is it possible to perform in-line matching using tfuzzy match component.I want to match on more than one column like on firstname,lastname,address,zip and phone number.Also is it possible to get different outputs for duplicate and unique values using this component.

For your in-line operation, could  you please elaborate your case with an example with input and expected output values?
 

Here is a component TalendHelpCenter:tRecordMatching which joins two tables by doing a fuzzy match on several columns using a wide variety of comparison algorithms.(define serveral keys)
Note:This component will be available in the Palette of Talend Studio on the condition that you have subscribed to one of the Talend Platform products.
Best regards
Sabrina
--
Don't forget to give kudos when a reply is helpful and click Accept the solution when you think you're good with it.