Four Stars

Talend Studio - Data Quality - MS SQL Regular Expression errors

Hi community.

 

I have been unable to find any solution online regarding the issue on Data Quality (Profiler) and Microsoft SQL (2012). (This is not Open Studio edition but still encounter this issue when I did my research prior to getting subscription)

 

When trying to use the default indicator patterns/rules; getting errors on Regular Expressions not supported  when connected to DB; not an issue from data files / java in-memory.

 

"Unsupported use of regular expressions on this database. Remove all pattern indicators from this analysis,please."

 

I know MS doesn't support natively so not sure what the Talend recommendation is.


Does anyone have a recommendation on how to best handle this so MS SQL database will work other than having to completely re-writing the existing SQL to sort of work for Microsoft?

 

Greatly appreciated.

  • Data Quality
1 ACCEPTED SOLUTION

Accepted Solutions
Moderator

Re: Talend Studio - Data Quality - MS SQL Regular Expression errors

Hello,

Could you please take a look at this document about:TalendHelpCenter:How to deploy the regular expression function to the SQL server?

Best regards

Sabrina

--
Don't forget to give kudos when a reply is helpful and click Accept the solution when you think you're good with it.
2 REPLIES
Moderator

Re: Talend Studio - Data Quality - MS SQL Regular Expression errors

Hello,

Could you please take a look at this document about:TalendHelpCenter:How to deploy the regular expression function to the SQL server?

Best regards

Sabrina

--
Don't forget to give kudos when a reply is helpful and click Accept the solution when you think you're good with it.
Four Stars

Re: Talend Studio - Data Quality - MS SQL Regular Expression errors

New to Talend, so hopefully my solution implementation in case others encounter as I spent some time on this

The Talend link response [https://help.talend.com/reader/fE51zt3EBkq3lp1Op6DPEw/V_1qeGGLLb8Li~d0ROXJSg] is helpful in providing some high level process flow but the documentation is bit dated (VS2008) and confusing instructions (to me).

My Steps to resolve/implement.

1) VisualStudio 2013 (SQL CLR )
I used VS2013( still dated I know) to build the SQL CRL user defined function to deploy to database.
*plenty of resources on how to do this; but here is a link I partial used for. (https://www.codeproject.com/Tips/841439/Create-Run-Debug-and-Deploy-SQL-CLR-Function-with)

Used the same c# method from Talend documentation for RegExMatch function

Deploy to server

**Make sure the SQL server settings are enabled for CLR
EXECUTE sp_configure 'clr enable', 1;  < can build into VS project to do pre-deployment script

my udf method called "dbo.RegExMatch"

2) Talend Studio DQ Profiler updates

A)  Update the library "indicator definition"
- Libraries > Indicators > System Indicators > Pattern Matching > Regular Expression Matching
- Add NEW Indicator Definition for Microsoft SQL Server
 
SELECT COUNT(CASE WHEN dbo.RegExMatch (<%=__COLUMN_NAMES__%> ,<%=__PATTERN_EXPR__%> ) = 1 THEN 1 END), COUNT(*) FROM <%=__TABLE_NAME__%> <%=__WHERE_CLAUSE__%>

See figure "DQProfiler_2A"

B)
Then under each of the pre-built regex functions; need to add "Microsoft SQL Server" to the Pattern Definition so that it knows to use the above Indicator Definition ;

 

For example:  Libraries > Patterns > Regex > address > "US Zipcode Validation"  
I just added the new Pattern Definition by copying the Default value and using "Microsoft SQL Server" as the new type

See figure "DQProfiler_2B"


3) Analysis now works in DQ Profiler
Success! Now regex indicators work in MS SQL tables.

Issues:
The report analysis will fail with trying to drill down in the analysis report graphs as Talend doesn't know how to build the Microsoft sql script

Example "View Valid/invalid row" WILL throw another error
"An expression of non-boolean type specified in a context where a condition is expected, near ')'

Talend SQL Editor shows (Talend default sql )
SELECT *  FROM "Talend_test"."dbo"."Txxxxx"  WHERE  DBO.REGEXMATCH( "MNSState",'^(A[KLRZ]|C[AOT]|DE|FL|GA|HI|I[ADLN]|K[SY]|LA|M[ADEINOST]|N[CDEHJMVY]|O[HKR]|PA|RI|S[CD]|T[NX]|UT|V[AT]|W[AIVY])$' )

Manually Correct Talend SQL Editor query so it returns results (based on SQL CLR udf)
SELECT * FROM "Talend_test"."dbo"."Txxxxx"  WHERE  DBO.REGEXMATCH( "MNSState",'^(A[KLRZ]|C[AOT]|DE|FL|GA|HI|I[ADLN]|K[SY]|LA|M[ADEINOST]|N[CDEHJMVY]|O[HKR]|PA|RI|S[CD]|T[NX]|UT|V[AT]|W[AIVY])$' ) = 1
 * this now returns rows based on condition (true) ; otherwise =0 for false condition


Hope this helps others using Microsoft SQL server and I'm sure I'll learn some better usages as I get more familiar with.

 

Thanks