tags:

views:

222

answers:

5

I would like to use a regular expression to match all words with more that one character, as opposed to words entirely made of the same char.

This should not match: ttttt, rrrrr, ggggggggggggg

This should match: rttttttt, word, wwwwwwwwwu

A: 

The following RE will do the opposite of what you're asking for: match where a word is composed of the same character. It may still be useful to you though.

\b(\w)\1*\b
pmarflee
+1  A: 

I would add all unique words to a list and then used this regex

\b(\w)\1+\b

to grab all one character words and get rid of them

+1  A: 

This doesn't use a regular expression, but I believe it will do what you require:

public bool Match(string str)
{
    return string.IsNullOrEmpty(str)
               || str.ToCharArray()
                     .Skip(1)
                     .Any( c => !c.Equals(str[0]) );
}
tvanfosson
A: 
\b\w*?(\w)\1*(?:(?!\1)\w)\w*\b

or

\b(\w)(?!\1*\b)\w*\b

This assumes you're plucking the words out of some larger text; that's why it needs the word boundaries and the padding. If you have a list of words and you're just trying to validate the ones that meet the criteria, a much simpler regex would probably do:

(.)(?:(?!\1).)

...because you already know each word contains only word characters. On the other hand, depending on your definition of "word" you might need to replace \w in the first two regexes with something more specific, like [A-Za-z].

Alan Moore
+6  A: 

The following expression will do the trick.

^(?<FIRST>[a-zA-Z])[a-zA-Z]*?(?!\k<FIRST>)[a-zA-Z]+$
  • capture the first character into the group FIRST
  • capture some more characters (lazily to avoid backtracking)
  • ensure that that the next character is different from FIRST using a negative lookahead assertion
  • capture all (at least one due to the assertion) remaining characters

Note that is sufficient to look for a character that is different from the first one, because if no character is different from the first one, all characters are equal.

You can shorten the expression to the following.

^(\w)\w*?(?!\1)\w+$

This will match some more characters other than [a-zA-Z].

Daniel Brückner
Nice explanation
Neil Williams