One Star

Where to see the validation used to mark a record valid/invalid

Hello,

when opening the "Customer Marketing Leads" example it shows:
- in the column header what type a column contains (i.e. LAST_NAME is indicated as a "us_county")
- records that are invalid (I assume according to the selected profile)

Where can one see these 'profiles (?)' (i.e. a us_county, an email, a city, airport) that are visible if you use the dropdown in the column header.

Can you define these profiles yourself? If so, where?

Thx.

br,
 Ruben

  • Data Quality
1 REPLY
Employee

Re: Where to see the validation used to mark a record valid/invalid

- we don't have a last name dictionary yet so the semantic analyzer picked the best next match in another dictionary, which happens to be US county (apparently counties in the US are often named after people: https://en.wikipedia.org/wiki/County_(United_States)#County_names)
- overall the philosophy of the product is to guess but you can always change
- we have 3 kinds of reference data: 1) regexes for syntax recognition, for instance emails 2) opened dictionaries = reference data but inherently incomplete such as first names 3) closed dictionaries, based on standardization bodies whenever possible, such as country names
There is no tool in the product to let you browse the dictionaries and there is currently no way to modify or enrich them (yet). But since we are open source you may check out:
https://jira.talendforge.org/browse/TDQ-10996
and
https://github.com/Talend/data-quality/blob/master/dataquality-semantic/src/main/resources/luceneIdx...