Six Stars

Regex extract 2 values (dates) from string using tExtractRegExFields

I could do with some help with tExtractRegexFields from the community. Given the string below

some text to begin with Start Date : '01-SEP-2016' , End Date : '30-SEP-2016'. Download 30-SEP-16

I want to extract Start Date and End Date values. I have the following RegEx

"(\\d{2}-\\D{3}-\\d{4})"

which will extract "a valid value" but doesn't extract both dates (I'm doing further processing to get the download date). My output schema is a generic schema of 2 nullable strings but only the 2nd value is extracted and stored in the 1st schema position. 

 

When I use http://regexr.com/ and remove the escape "\" chars the regex above  becomes

(\d{2}-\D{3}-\d{4})

and will highlight both matching dates which seems to imply the regex is valid for both date values, it just won't extract both values.

Can anyone provide me with some pointers to capture the 1st and 2nd dates and store them in the correct positions in the output ?

  • Data Integration
1 ACCEPTED SOLUTION

Accepted Solutions
Nine Stars TRF
Nine Stars

Re: Regex extract 2 values (dates) from string using tExtractRegExFields

Hi,

Define the tExtractRegexFields schema as:

tExtractRegexFields.PNG

and the regex as:

"(^.*)(\\d{2}-\\D{3}-\\d{4})(.*)(\\d{2}-\\D{3}-\\d{4})(.*$)"

Here is the result from tLogRow:

result.png

Hope this helps,

 


TRF
4 REPLIES
Six Stars

Re: Regex extract 2 values (dates) from string using tExtractRegExFields

The following regex strings either generate an error or don't extract both dates

"(\\d{2}-\\D{3}-\\d{4}).(\\d{2}-\\D{3}-\\d{4})"
"\\+(\\d{2}-\\D{3}-\\d{4})"
"(\\d{2}-\\D{3}-\\d{4})//g"
Nine Stars TRF
Nine Stars

Re: Regex extract 2 values (dates) from string using tExtractRegExFields

Hi,

Define the tExtractRegexFields schema as:

tExtractRegexFields.PNG

and the regex as:

"(^.*)(\\d{2}-\\D{3}-\\d{4})(.*)(\\d{2}-\\D{3}-\\d{4})(.*$)"

Here is the result from tLogRow:

result.png

Hope this helps,

 


TRF
Six Stars

Re: Regex extract 2 values (dates) from string using tExtractRegExFields

Thanks the 

.*

you had inside the capture group to extract the middle text (which I'm reading as zero or more characters) was what I needed followed by the Date extractor again which gave me

"(\\d{2}-\\D{3}-\\d{4}).*(\\d{2}-\\D{3}-\\d{4})"
Ten Stars

Re: Regex extract 2 values (dates) from string using tExtractRegExFields

Well that escalated quickly.  I missed three replies while typing my own!