Fuzzy logic processing

One Star

Fuzzy logic processing

I am trying to use the fuzzy logic module to identify companies by name in a text string using a lookup file of clean company names.
I want to filter using t_filter but I am missing something because it is not working on matching the company names. The text string being processed is floating and can contain extraneous characters in the field.
I have tried all three modes of matching and so far nothing is coming out right. It is not even close.

I am trying to filter the rows by showing me the matches with a value <= 1 which is added to the data as an indication of a match. Even then all the rows are being processed as matching.
Community Manager

Re: Fuzzy logic processing

Hello
First, there is a compilation error in the generated code. The data type of column is a string text, why don't use the operator 'lower than'? "lower than' is used for int/Integer type.
Can you give us an example of request and what are your expect result?
Best regards
Shong
----------------------------------------------------------
Talend | Data Agility for Modern Business
One Star

Re: Fuzzy logic processing

I have a lookup file of clean company names in a text file.
Microsoft
Boeing
At&T
etc.
I have a text string field in a teradata table that contains rows with company names that may contain other extraneous characters.
Boeing mcc=400 800-777-7777
AT&T retail store 4212
Pandera Bakery Store 4356 510-222-2085

I need to find the records in the teradata table that match the company names in the lookup and insert those records into another teradata table.
I am using the t_filterrow as a means to identify those that have very close match to the company name in the look up. In the output there are two additional fields written. match and value.
Value seems to be a indicator of how well it matches. Match appears to be a random assortment of alpha characters that do not match anything in the lookup.
I would think that if the values are close in matching (x% of characters match) it would output the match.

I assumed that any match lower than 1 would be a very close match possibly. That does not appear to be the case.
Perhaps I am missing something in understanding how to use the fuzzy logic component.
Community Manager

Re: Fuzzy logic processing

Hello
You have some misunderstand on the fuzzy logic component, please read the user demo on documentation.
About your request, you can use tMap+tFilterRow components to finish it. Please see my screeshots.
in.txt:

Boeing mcc=400 800-777-7777
AT&T retail store 4212
Pandera Bakery Store 4356 510-222-2085

lookup.txt:

Microsoft
Boeing
At&T

result:
Starting job forum10495 at 14:54 06/04/2010.
connecting to socket on port 3863
connected
.---------------------------+-----------.
| tLogRow_1 |
|=--------------------------+----------=|
|text |companyName|
|=--------------------------+----------=|
|Boeing mcc=400 800-777-7777|Boeing |
|AT&T retail store 4212 |At&T |
'---------------------------+-----------'
disconnected
Job forum10495 ended at 14:54 06/04/2010.

Best regards
Shong
----------------------------------------------------------
Talend | Data Agility for Modern Business

Re: Fuzzy logic processing

Hello
You have some misunderstand on the fuzzy logic component, please read the user demo on documentation.
About your request, you can use tMap+tFilterRow components to finish it. Please see my screeshots.
in.txt:

Boeing mcc=400 800-777-7777
AT&T retail store 4212
Pandera Bakery Store 4356 510-222-2085

lookup.txt:

Microsoft
Boeing
At&T

result:
Starting job forum10495 at 14:54 06/04/2010.
connecting to socket on port 3863
connected
.---------------------------+-----------.
|               tLogRow_1               |
|=--------------------------+----------=|
|text                       |companyName|
|=--------------------------+----------=|
|Boeing mcc=400 800-777-7777|Boeing     |
|AT&T retail store 4212     |At&T       |
'---------------------------+-----------'
disconnected
Job forum10495 ended at 14:54 06/04/2010.

Best regards
Shong

Hi Shong,
Can we match multiple columns using fuzzy match ?
We are using 6.1.1. version of DI studio.
Regards,
PP