One Star

Matching using Talend Open studio for DQ (community version)

Hi,
I am trying to do a matching between 2 csv files based on multiple columns(foe example first name,last name, date of birth) using TOS fo DQ. I found a way to do it using matching analysis but i am facing a challenge:
In the matching analysis perspective it seems that i can do the matching only using only 1 csv file, but i have 2 files (1 input file + 1 reference or master file)
My question is how can i match the 2 csv files using the matching analysis in DQ ?
thank you
Regards
3 REPLIES
Moderator

Re: Matching using Talend Open studio for DQ (community version)

Hi,
Can the component TalendHelpCenter:tFuzzyMatch which compares a column from the main flow with a reference column from the lookup flow and outputs the main flow data displaying the distance meet your needs?
Best regards
Sabrina
--
Don't forget to give kudos when a reply is helpful and click Accept the solution when you think you're good with it.
One Star

Re: Matching using Talend Open studio for DQ (community version)

hi xdshi,
i already tried this component but it does not offer many algorimthms (soundex, q gram,... not included). Also the levenshein algorimth does not provide a score (between 0 and 1) but a distance.
The other challenge i face it that i need to get from the reference file all the rows that have similarity with the main input file. Here is a scheme of the job i did using tfuzzymatch:
                                  tfileinputdelimited(reference flow)                       
                                                           ||
tfileinputdelimited(input flow)=====>tfuzzymatch====> matched rows from the input flow.
Instead of matched rows in the input flow i want the matched rows from the reference flow.
thank you
Moderator

Re: Matching using Talend Open studio for DQ (community version)

Hi,
Here is a component tRecordMatching which can join two tables by doing a fuzzy match on several columns using a wide variety of comparison algorithms,however, this component will be available in the Palette of Talend Studio on the condition that you have subscribed to one of the Talend Platform products not open source.
Best regards
Sabrina
--
Don't forget to give kudos when a reply is helpful and click Accept the solution when you think you're good with it.