Regex to get middle of a string with known character boundaries

One Star

Regex to get middle of a string with known character boundaries

Hi, can I use a regex in tMap to get only what's in the .* portion of this string?
(<br>Specific Text: ).*(</br>)
If not, what's the best way to go about this?
Thank You Smiley Happy
Seventeen Stars

Re: Regex to get middle of a string with known character boundaries

I would use a routine like this:
http://www.talendforge.org/exchange/index.php?eid=1054&product=tos&action=view&nav=1,1,1
I would define the portion of text you want to get in a regex group and everything surrounding outside the group.
One Star

Re: Regex to get middle of a string with known character boundaries

Hi and thanks for sharing this
EDIT: I see now that this seems to be a user routine, I got it installed Smiley Happy
One Star

Re: Regex to get middle of a string with known character boundaries

Would you give me an example of how to use the extractByRegexGroup expression given the string in my original question?
Seventeen Stars

Re: Regex to get middle of a string with known character boundaries

This is the regex you need: <br>Specific Text: (.*)</br>
as index for the group you have to extract use 1 (regex groups starts with 1)
One Star

Re: Regex to get middle of a string with known character boundaries

Hi, using this in a tMap I am attempting the following:
RegexUtil.extractByRegexGroup(MyTable.MyField,"<br>Purchase Timeframe: (.*)</br>",1)
But nothing is coming through into the table
Seventeen Stars

Re: Regex to get middle of a string with known character boundaries

Could you please check one of your datasets with a regex test tool ? As you see in my picture the regex works.
This routine works in my projects for a couple of years and I am absolute sure the problem are your data or your job.
One Star

Re: Regex to get middle of a string with known character boundaries

Hi thanks for sticking with me, my regex was wrong it should have been:
"<br>Purchase Timeframe: (.*)<br>"
Not:
"<br>Purchase Timeframe: (.*)</br>"
One Star

Re: Regex to get middle of a string with known character boundaries

Ok I have had partial success, but a good portion of the rows are being rejected due to a Data Truncation error.
Again here is my tMap expression:
RegexUtil.extractByRegexGroup(tablename.fieldname,"<br>Purchase Timeframe: (.*)<br>",1)
From what I can see it looks like for the ones that are making it through properly are the ones in which the Purchase Timeframe value is actually the end of the field.
So for example, a field like this:
".....contentcontentcontent.... <br>Purchase Timeframe: One Month<br>"
Gets through and appears perfectly in the target table as:
One Month
But a field like this:
".....contentcontentcontent.... <br>Purchase Timeframe: One Week<br>Will Finance Purchase: Yes<br>I Have a Trade-in: No<br>"
Fails and my tLogRow rejects output looks like this for the row:
One Week<br>Will Finance Purchase: Yes<br>I Have a Trade-in: No||||Data truncation: Data too long for column 'BuyBy' at row 1 - Line: 290
The fact that it's the end of the field for the ones that made it through is the only difference I can perceive so far
I will keep looking to see if I can find any other difference
Thank You Smiley Happy
One Star

Re: Regex to get middle of a string with known character boundaries

I think I have some valid results by adding a ? to the regex:
RegexUtil.extractByRegexGroup(tablename.fieldname,"<br>Purchase Timeframe: (.*?)<br>",1)
I am now no longer getting any Data Truncation errors
Seventeen Stars

Re: Regex to get middle of a string with known character boundaries

Fine. You could replace the star * in your regex with a expression like this: {,1000} to allow not content and content with size of 1000 chars.