tags:

views:

682

answers:

5

How can I describe the language

A → AA | ( A ) | ε

generates using regular expressions?

+6  A: 

Regular expressions cannot match nesting brackets.

David Schmitt
Not so sure... http://blog.stevenlevithan.com/archives/regex-recursion
jm
@jm: If you had read the article fully, Steve writes that "regular" regexes can match the brackets only if "braces are always balanced, andthe level of brace nesting is no more than one." Using extended syntax like balancing groups (http://blog.stevenlevithan.com/archives/balancing-groups) which seem to be a clever way to create recursive decent parsers doesn't count for me.
David Schmitt
+12  A: 

Regular expressions accept strings from regular languages. Regular languages can also be accepted by an FSM.

There's an potentially infinite number of brackets in your language that you have to match up. That means you need an infinite state, obviously impossible in any Finite State Machine. Therefore, your language isn't regular and can't be matched with a regex.

MSalters
+1  A: 

I'm not sure about how you could notate that language, but that language isn't regular, as can be shown using the pumping lemma for regular languages (and thus, it can't be noted by a regex). An intuitive explanation is that accepting words from that language would require the FDA to 'remember' the number of opening parenthesis that it just read every time it begins to read closing parenthesis, and that isn't possible for them as they have no 'memory'. A push-down automaton, on the other hand...

Could that language be noted as {(n)n}*, for any n?

A: 

You can't - regular expressions can only recognise a small subset of possible languages. In particular, informally, any language that requires an unbounded amount of memory potentially to recognise is not RE recognisable.

Here, you'd need an unbounded amount of memory to remember how many opening parentheses you've seen in order to make sure the number of closing parentheses are the same.

You'll need some mechanism that is capable of parsing Context-Free Grammars to be able to recognise languages described by BNF in general. Modern parsers are very good at this!

HenryR
A: 

As others have said you cannot do this with a single regular expression, however you can tokenize it with two "\(" and "\)". Seeing that your language can only have brackets in it though I'm not sure this is going to be very useful.

Note: You will also need a passer to ensure the brackets are paired up correctly. So “(()()” will tokenize but will not parse.

Martin Brown