tags:

views:

83

answers:

1

hi,

we're learning about the difference between regular languages and regex, and the teacher explained that the language

a^n b^n

is not regular, but she said that most regex flavors can match

a^n A^n

and she gave this problem for our extra credit homework problem. we've been struggling for a couple of days now, and could really use some guidance.

thanks in advance.

+10  A: 

The teacher gave a HUGE hint by limiting the alphabet to {a, A}. The key to solving this problem is realizing that in case-insensitive mode, a matches A and vice versa. The other component of the problem is matching on backreference.

This pattern will match a{n}A{n} for some n (as seen on rubular.com):

^(?=(a*)A*$)\1(?i)\1$

How it works

The pattern works as follows:

  • ^(?=(a*)A*$) - Anchored at the beginning of the string, we use positive lookahead to see if we can match (a*)A* until the end of the string
    • If we can succesfully make this assertion, then \1 captures the sequence of a{n}
    • If the assertion fails, then there's no way the string can be a{n}A{n}, since it's not even a*A*
  • We then match \1(?i)\1$, that is, the a{n} captured by \1, then in case-insensitive mode (?i), we match a{n} again until the end of the string
  • Thus, since:
    • The string is a*A*
    • \1 is a{n}
    • \1(?i)\1 can only be a{n}A{n}

Related questions

References

See also

polygenelubricants
+1 That's a seriously clever answer.
cletus
That's a seriously clever homework problem.
polygenelubricants
For someone just learning the language theory, this little blurb is worth reading after this answer : http://en.wikipedia.org/wiki/Regular_expression#Patterns_for_non-regular_languages
Donnie
+1, Nice solution. But one thing that confused me, how did you infer that 'a^n' means 'a' repeated n times? I didn't get that from the question, but I guess it makes sense.
Stephen
@Stephen: I believe it's a common notation in language theory. The question I linked to uses the same notation. I've modified my answer to use traditional regex notation.
polygenelubricants
@polygene : I see, thanks.
Stephen