tags:

views:

70

answers:

1

Hi,

I'm frustrated with composing regular expressions for Matching "ABAB", "AABB", "ABB", "AAB", "ABAC" and "ABCB".

Let's take "ABAB" for example, all the following string will be matched:

abab
bcbc
1212
xyxy
9090
0909

Which means the RegEx should match a string whose 1st and 3rd characters are same, and 2nd and 4th are also same, but 1st and 2nd should NOT be the same (3rd and 4th should not be same, of course).

Do I make myself clear?

Thanks.

Peter

+4  A: 

ABAB like pattern

(\w)(\w(?<!\1))\1\2
  • (\w) match a word character (digit, letter...) and capture the match into backreference 1
  • (\w...) match a word character (digit, letter...) and capture the match into backreference 2
  • (?<!\1) assert that it is impossible to match the regex matched by capturing group number 1 with the match ending at this position (negative lookbehind)
  • \1 match the same text as most recently matched by capturing group number 1
  • \2 match the same text as most recently matched by capturing group number 2

Others patterns

  • AABB ==> (\w)\1(\w(?<!\1))\2
  • ABB ==> (\w)(\w(?<!\1))\2
  • AAB ==> (\w)\1(\w(?<!\1))
  • ABAC ==> (\w)(\w(?<!\1))\1(\w(?<!\1|\2))
  • ABCB ==> (\w)(\w(?<!\1))(\w(?<!\1|\2))\2
madgnome
Extra points for the very good explanation.
Paddy
Hi madgnome, you are awesome. It works excellent :-)
Peter Lee
Hi madgnome, another small question: "12121" looks like there are two matches, but normal regex (@"(\w)(\w(?<!\1))\1\2") will only give one match, how can I find two matches ("1212" "2121")?
Peter Lee