Two Stars

tExtractRegexField to split a string in two

Hello,

 

I'm currently working on a job but i'm facing a problem. I got a field that contains strings that i would like to separate in two columns but I can't find the good regex to split it in the way I want, here is an exemple.

All my fields are structured that way : " hello, my name is steven, I live in Paris (215484845)"

I want to separe it in : 1/ hello, my name is steven, I live in Paris     and    2/ 215484845       <----- ( it's the ID)

 

But it may happens that some parenthesis can be found before the ones that handle the ID, but this one is always at the end of the string.

So if a string like this is find: "Hello, my name is jeff, i'm (27) and i like ice cream (58646846)" i want to split it that way:

1/ Hello, my name is jeff, i'm (27) and i like ice cream and 2/58646846

Thanks in advance for your help !

3 REPLIES
Six Stars

Re: tExtractRegexField to split a string in two

I think you're going to need to exploit the structure of the phone number to do this and split when the parser finds a parenthesized string with 9 digits (ignoring all other parenthesized tokens). Is that something you can do, or will that lead to data errors?

 

David

Ten Stars

Re: tExtractRegexField to split a string in two

There is a Java string method lastIndexOf() which may be helpful to you. It returns the index of the last appearance of a character or string in another string.

rowX.columnName.lastIndexOf("(") and rowX.columnName.lastIndexOf(")") should get you the indexes of the start and end of your ID. Using those in a substring can extract the ID without having to apply a regex.
Six Stars

Re: tExtractRegexField to split a string in two

Please check the attached job. 

 

in tMap:

two output ports:

Column1: row1.Desc.substring(0,row1.Desc.lastIndexOf("(")-1) 

Column2: row1.Desc.substring(row1.Desc.lastIndexOf("(")+1).replace(")","") 

 

row1.Desc is nothing but your input string.

 

Regards,

 

Veeranjaneyulu Boppudi