tags:

views:

3189

answers:

3

Can anyone give me a Java regex to identify repeated characters in a string? I am only looking for characters that are repeated immediately and they can be letters or digits.

Example:

abccde <- looking for this (immediately repeating c's)

abcdce <- not this (c's seperated by another character)

+8  A: 

Try "(\\w)\\1+"

The \\w matches any word character (letter, digit, or underscore) and the \\1+ matches whatever was in the first set of parentheses, one or more times. So you wind up matching any occurrence of a word character, followed immediately by one or more of the same word character again.

(Note that I gave the regex as a Java string, i.e. with the backslashes already doubled for you)

David Zaslavsky
Good one, David. But maybe it should be "((\\w)\\2+)+". That would match the repeating pair - any no. of times and would match the entire set of repeating occurences in Backref #1.
Cerebrus
Since java implictly adds the "^" and "$" delimiters, this expression will match strings like "cc" and "cccc" but not "xcc" etc. That's where I get stuck. How can I make the regex match anywhere in the string?
JediPotPie
I guess my problem was that I was using the "matches()" method to check for a match. My mistake. Thanks for the help.
JediPotPie
@Cerebrus, I don't see the benefit of that outer set of parentheses. If the input were "aabbbcddd" your regex would match "aabbb" the first time you call find(), then match "ddd" the next time around. All you get is a trivial performance gain for performing fewer matches.
Alan Moore
A more explicit regex is as follows: ".*([0-9A-Za-z])\\1+.*". This searches for repeats anywhere in the string and can be used with pattern.matcher(...).matches()
Gennadiy
A: 

Regular Expressions are expensive. You would probably be better off just storing the last character and checking to see if the next one is the same. Something along the lines of:

String s;
char c1, c2;
c1 = s.charAt(0);
for(int i=1;i<s.length(); i++){
    char c2 = s.charAt(i);

    // Check if they are equal here

    c1=c2;
}
More expensive than manually iterating through a string's chars ? I don't think so!
Cerebrus
Yep, that's one way to do it but that's not the way i need. I need a regular expression.
JediPotPie
@John Terry: you think this is a worthwhile optimization and yet you program in Java? Strange. The regex-version is shorter and quicker to grok. I'd choose it any day.
Joachim Sauer
And simply saying regexes are "expensive" is just FUD. It's true that a regex-based solution can never be as fast as a well-written solution based on the low-level String API, but Java's regexes are plenty fast enough for most applications.
Alan Moore
+2  A: 
String stringToMatch = "abccdef";
Pattern p = Pattern.compile("(\\w)\\1+");
Matcher m = p.matcher(stringToMatch);
if (m.find())
{
    System.out.println("Duplicate character " + m.group(1));
}
Simon Nickerson