[resolved] How to extract substrings according to regex pattern in tMap

One Star

[resolved] How to extract substrings according to regex pattern in tMap

Hi,
i am new to Talend, and I trouble my mind how to perform Regex string manipulations in tMap.
Situation is: I have a column that has some date information embedded in text like this:
"AC/DC tickets for 20/12/2010"
Here i want to extract the date. My first approach was to utilise tExtraxctRegExFields, but since "cylces" in data flow are not supportet i dont see a way to rejoin this column with the rest of the dataset once it is split up.
I found a regex pattern
"({2}/{2}/{4})"
identifying the date, but here is my question: What is the correct java statement / code i have to create in tMap expression builder for this column? I tried Pattern.compile() but could not find a valid and working construction.
The source column holds the whole string, destination column should be stripped down to the date as a substring according to the regex pattern.
I use TOS 4.0.
Any help is appreciated.
Thanks
dexter

Accepted Solutions
One Star

Re: [resolved] How to extract substrings according to regex pattern in tMap

Hi eguerin,
thanks for the quick reply and the improved regex Smiley Happy
tExtractRegexFields isnt the best choice for me here, since it can only split up one column at a time as i see, and i have more than 1 column that needs some string cleansing in the data flow.
I have figured out that there are posts dealing with similar problems:
http://www.talendforge.org/forum/viewtopic.php?id=9720
It suggests RegEx String manipulations can be done in tMap which would be very elegant. But how exactly is the java statement to extract a substring from a given input string using a defined regex pattern? Should i write into code directly or can i implement it in the tmap expression builder?
Thanks
dexter

All Replies
One Star

Re: [resolved] How to extract substrings according to regex pattern in tMap

Hi,
You can use the tExtractRegexFields component with this pattern : "^(.+)({2})/({2})/({4})$"
On your output shema you have just to declare 4 columns :
- text
- day
- month
- year
And that's all.
One Star

Re: [resolved] How to extract substrings according to regex pattern in tMap

Hi eguerin,
thanks for the quick reply and the improved regex Smiley Happy
tExtractRegexFields isnt the best choice for me here, since it can only split up one column at a time as i see, and i have more than 1 column that needs some string cleansing in the data flow.
I have figured out that there are posts dealing with similar problems:
http://www.talendforge.org/forum/viewtopic.php?id=9720
It suggests RegEx String manipulations can be done in tMap which would be very elegant. But how exactly is the java statement to extract a substring from a given input string using a defined regex pattern? Should i write into code directly or can i implement it in the tmap expression builder?
Thanks
dexter
One Star

Re: [resolved] How to extract substrings according to regex pattern in tMap

Ok, you can do this with a routine (into the menu Code > Routines).
You have to create a new routine and develop your code in Java.
After this, you're able to reuse your routine into de expression builder (into the tMap) like this :
routines.yourRoutineName.yourMethodName(param1, paramx)
It's very easy to create a new routine.
One Star

Re: [resolved] How to extract substrings according to regex pattern in tMap

thanks,
custom code sounds like a good option. Ill give it a try..
/wave
dexter