Regex to match repeated letters in string using java pattern

Four Stars

Regex to match repeated letters in string using java pattern

Talend version: 4.2.3 / 5.2.0M3
OS: Windows / Mac
I am trying to parse out phone numbers which have repeating characters e.g.
0000, 111111, 99999999, 8888, 22222, 000000000 - basically anything which is repeat
I am using the following RegEx in the tMap
row1.Phone.matches("()\1{3}")?"":row1.Phone
or this
row1.Phone.matches("()\\1{3}")?"":row1.Phone (parse out forward slash for java....
When testing the expression I get this
Exception in thread "Main" java.lang.error : Unresolved compilation errors
This works outside Talend - see 2nd SS.
Any ideas?
Employee

Re: Regex to match repeated letters in string using java pattern

Actually, a job is generated behind the scene to handle this record.
As your value to test is not quoted, the generated job contains expression like <b>123.matches(...)</b> which does not fullfil the java syntax
You can try to replace all the occurences of <row1.Phone> by <String.valueOf(row1.Phone)> or even <row1.Phone+""> to avoid compilation problems.
Although this compile problem does not appear if you run the job. it is a bit weird in the test area from the point of view of users.
Another thing is that the method "string.matches(regex)" will only filter the records like "1111" "2222", but not "12222" or "22221". Same results with "Pattern.matches(regex, string)".
So I propose to use the following expression to filter the inputs containing repetition inside:
java.util.regex.Pattern.compile("()\\1{3}").matcher(String.valueOf(row1.Phone)+"").find() ? "" : String.valueOf(row1.Phone)
This can work in the test area too since I added "String.valueOf"
Four Stars

Re: Regex to match repeated letters in string using java pattern

Thanks, no more compilation errors. I see now, very powerful. It's possible to call external Java classes!
Very close now, but not sure all the use cases are covered.
A phone number like this results in null
0800 2222 1234
It's beginning to look like RegEx is not the best answer. What do you think / recommend?
Thanks
Four Stars

Re: Regex to match repeated letters in string using java pattern

I'm thinking a custom Routine. Get the first char and string length, then compare with original string
Four Stars

Re: Regex to match repeated letters in string using java pattern

Final solution is this - added a custom routine which I call in the tMap
package routines;
public class FT_CompareString {

public static String CheckString(String input) {

if (input != null && input != "" && input.replace(" ", "").split(input.replace(" ", "").substring(0,1)).length >0){
return input;
}
else{
return null;
}


}
}