tags:

views:

324

answers:

4

Question:
Is is possible, with regex, to match a word that contains the same character in different positions?

Condition:
All words have the same length, you know the character positions (example the 1st, the 2nd and the 4th) of the repeated char, but you don't know what is it.

Examples:
using lowercase 6char words I'd like to match words where the 3rd and the 4th chars are the same.

parrot <- match for double r
follia <- match for double l 
carrot <- match for double r
mattia <- match for double t
rettoo <- match for double t
melone <- doesn't match

I can't use the quantifier [\d]{2} because it match any succession of two chars, and what if I say the 2nd and the 4th position instead of 3rd and 4th?

Is it possible to do what I want with regex? If yes, how can I do that?

EDIT:
Ask asked in the comments, I'm using python

+10  A: 

You can use a backreference to do this:

(.)\1

This will match consecutive occurrences of any character.


Edit   Here’s some Python example:

import re

regexp = re.compile(r"(.)\1")
data = ["parrot","follia","carrot","mattia","rettoo","melone"]

for str in data:
    match = re.search(regexp, str)
    if match:
        print str, "<- match for double", match.group(1)
    else:
        print str, "<- doesn't match"
Gumbo
the alternative to this is (aa|bb|cc|..zz|AA|BB|lol)
dfa
+1  A: 

/(\b\w*?(\w)\2.*?\b)/

will match any word with atleast on character repetition $1 being the word $2 the first repetition.

Martijn Laarman
+5  A: 

Hi,

You need to use back references for such cases. I am not sure which language you are using, I tried the following example in my VI editor to search for any alphabet repeating. Pattern Regex: \([a-z]\)\1

If you see the example, [a-z] is the pattern you are searching for, and enclose that inside the paranthesis (the parantheses should be escaped in some languages). Once you have a paranthesis, it is a group and can be referred again anywhere in the regex by using \1. If there is more than one group, you can use \1, \2 etc. \1 will be replaced by whatever was matched in the first group.

Thanks Arvind

Arvind
A: 

Yes, you can use backreference construct to match the double letters.

The regular expression (?<char>\w)\k<char>, using named groups and backreferencing, searches for adjacent paired characters. When applied to the string "I'll have a small coffee," it finds matches in the words "I'll", "small", and "coffee". The metacharacter \w finds any single-word character. The grouping construct (?<char>) encloses the metacharacter to force the regular expression engine to remember a subexpression match (which, in this case, will be any single character) and save it under the name "char". The backreference construct \k<char> causes the engine to compare the current character to the previously matched character stored under "char". The entire regular expression successfully finds a match wherever a single character is the same as the preceding character.

Rashmi Pandit
You should always use backticks or code blocks to format any source code you include in your posts. This answer made no sense at all until I added backticks around your regexes.
Alan Moore
Oops!! My bad! Thanks Alan :)
Rashmi Pandit