Talend open studio for DQ: Regex pattern ^.{50}$ does not match when field begins or ends with spaces. Why??

Highlighted
Four Stars

Talend open studio for DQ: Regex pattern ^.{50}$ does not match when field begins or ends with spaces. Why??

I’m new to talend and trying to use the profiler tool to analyse data in a csv file. Values in one of the columns in the file has to be exactly 50 characters long (any characters allowed, even spaces should be included).

I’ve tried regex patterns ^.{50}$ and ^[\S\s]{50}$ but I don’t get matches when a value in the column begins and/or ends with spaces (total characters including the spaces are 50 when I’m testing - but no match). Anyone who has an idea why?

Moderator

Re: Talend open studio for DQ: Regex pattern ^.{50}$ does not match when field begins or ends with spaces. Why??

Hello,

If we understand your requirement very well, maybe you could try ^[a-zA-Z,]{1,50}$

We will appreciate it if you could post your input data sample content here.

Elaborating your case with an example with input and expected output values will be preferred.

Best regards

Sabrina

 

--
Don't forget to give kudos when a reply is helpful and click Accept the solution when you think you're good with it.
Four Stars

Re: Talend open studio for DQ: Regex pattern ^.{50}$ does not match when field begins or ends with spaces. Why??

Hi Sabrina, thanks for your response

 

So I want to make sure the value in the column is EXACTLY 50 characters. That is the goal with this expression. I believe your regex matches only letters a-z, A-Z with minimum 1 and max 50.

 

I want it to match for example:

' gggdhfjgkdlj jhdgsyru odk837<o +d-sgayr29l sjflm ' (50 characters with space in the beginning and end)

'                                                aa' (50 characters with 48 spaces in the beginning)

'aa                                                ' (50 characters with 48 spaces in the end)

 

I don't want to match values in a column that contains more than 50 characters. Does that make sense?

 

However, using the regex in my original post, in Talend, only matches values that start AND end with a character that is NOT a space. E.g.

'pfagdhfjgkdlj jhdgsyru odk837<o +d-sgayr29l sjflmo' (50 characters with a non-space character in the beginning and end)

'a                                               aa' (50 characters with a non-space character in the beginning and end)

'aa                                               a' (50 characters with a non-space character in the beginning and end)

 

Appreciate your help!

2019 GARNER MAGIC QUADRANT FOR DATA INTEGRATION TOOL

Talend named a Leader.

Get your copy

OPEN STUDIO FOR DATA INTEGRATION

Kickstart your first data integration and ETL projects.

Download now

What’s New for Talend Summer ’19

Watch the recorded webinar!

Watch Now

Enabling Data Governance

Learn how to enable Data Governance

Watch Now

The Definitive Guide to Government Data Quality

Take a peek at the definitive guide to Government Data Quality

Read