ansaurus

Question

Answer 1

+2 A:

The traditional answer to this is a resounding "NO". As we have learned in the compilers class, a regular grammar can't be used to describe a language with a recursive definition (i.e. you can't use a finite state machine)

What's needed here is a context free parser, the implementation of which boils down to a finite state machine + a STACK.
See ANTLR, bison etc.

Cristi Diaconescu 2010-08-29 10:14:38

Next time you might want to consider leaving a few minutes before posting your answer as it is difficult for other people to get any votes if there is already an upvoted post, so this may discourage many other people from posting... or even from looking at the question (if the question is already answered many people won't even see it). I do assume you are interested in receiving other opinions, otherwise you wouldn't have posted... right? PS: in .NET I believe that it is possible to do this using "regular" expressions but you're right that it's not advisable to use regex for this purpose.

Mark Byers 2010-08-29 10:26:51

@mark Note taken. And yes, I'm quite interested in the answers. I remember having read somewhere about some less orthodox extensions to a regex library that allows matching parens under circumstances, but I can't remember which library or which circumstances...

Cristi Diaconescu 2010-08-29 10:32:45

I can't follow any of this. People should wait before posting correct answers? Regular expressions can be used for languages requiring a DPDA?

EJP 2010-08-29 10:36:12

@EJP: It's a self-answer and *strictly* speaking it isn't correct because a "regular" expression in .NET is not actually regular at all and in fact it can parse the above. But this answer is still good advice in general. There's nothing wrong with posting self answers and nothing wrong with this answer, I'm just saying if he actaully *wants different opinions* rather than just a quick rep boost (and I assume he does) then he should have waited before posting to avoid discouraging other people from answering.

Mark Byers 2010-08-29 10:41:00

@mark I re-read your comment. Are you saying this CAN be done in .NET?Point me to some evidence and you'll get the accepted answer.

Cristi Diaconescu 2010-08-29 10:56:35

@EJP Wasn't familiar with the DPDA acronym so I looked it up. The 1st match on Google is hillarious, in a toilet humour way :)

Cristi Diaconescu 2010-08-29 10:58:47

@Mark, not just .NET but pretty much all regex-implementations of programming languages used nowadays can't be called "regular". If a regex implementation supports groups and back-references to those groups, or look around assertions, it isn't "regular" in the more strict definition of it.

Bart Kiers 2010-08-29 11:46:21

@Crista Diaconescu: The best info I could find on this subject was here: http://stackoverflow.com/questions/3349999/net-regex-recursive-patterns

Mark Byers 2010-08-29 12:48:00

Answer 2

+1 A:

@Cristi is right about the regex: It is theoretically impossible to solve recursive expressions using a stackless, finite-state automaton. The solution, however, is simpler: you only need to keep a counter of the number of open parentheses, and make sure it doesn't drop below 0. It is more memory-saving than maintaining a stack, and you only need the count - not the contents - of the Parentheses.

Algorithm:

counter = 0                        // Number of open parens
For char c in string:              
    print c                        
    if c=='{':                     // Keep track on number of open parens
        counter++
    if c=='}':
        counter--
    if counter==1:                 // New line if we're back to the top level
        print "\n"
    elif counter<1:                // Error if the nesting is malformed
        print "ERROR: parentheses mismatch"
        break

Adam Matan 2010-08-29 10:27:01

This doesn't take into account the escaped curlies...

Cristi Diaconescu 2010-09-01 08:28:12

True, but the fix is quite simple.

Adam Matan 2010-09-02 19:55:37

Answer 3

+4 A:

Unfortunately the answer is YES for some flavor of Regex, e.g. PCRE and .NET because they support recursive pattern and stack-like operations respectively.

The grammar can be written as

ELEMENT  -> (?!\{)\S+ | LIST
LIST     -> '\{\s*' ELEMENT? ('\s+' ELEMENT)* '\s*\}'

thus in PCRE, this can be transformed into the pattern:

   \{\s*(?0)?(?:\s+(?0))*\s*\}|(?!\{)(?:[^\s}]|\}(?![\s}]))+

#  ---------------------------                   ^^^^^^^^^
#            LIST                    Make sure the } is not closing the group

See http://www.ideone.com/SnGsU for example (I have stripped the top-level { and } for simplicity).

(Of course, don't try this at work :) )

(BTW, I don't know how to transform this PCRE into .NET flavor. If someone knows, please try http://stackoverflow.com/questions/3349999)

KennyTM 2010-08-29 10:45:25

Wow! Just one question: in your grammar definition, what does the (?!\{) at the beginning of ELEMENT mean?

Cristi Diaconescu 2010-08-29 11:27:27

@Cristi: It is a [negative lookahead](http://www.regular-expressions.info/lookaround.html).

KennyTM 2010-08-29 11:37:48

I wish I could select two answers as 'accepted answer', as this one is pretty complete. However, Kobi's answer is a better match for what I was searching, and the regex there is IMO quite a bit more readable.

Cristi Diaconescu 2010-09-01 11:58:04

Answer 4

+1 A:

Well, the edit removes curly braces from tokens and takes the sting from the question, and now it is easily doable with .Net Regexes, using balancing groups. It is simply matching braces, which is a basic example.
Much like KennyTM's answer, this will only work if you remove the top level braces, or it will match the whole input.
Again, this is better used for recreational purposes:

(?:                    # try matching...
    (?:\\[{}]|[^\s{}])+\s*? # a literal (allow escaped curly braces)
    |                       # OR
    (?<Curly>{)\s*          # "{" and push to stack
    |                       # OR
    (?<-Curly>})\s*?        # "}", pop from stack and fail if the stack is empty
)+?                    # ...a few times, and stop whenever you can.
(?(Curly)(?!))         # Make sure there aren't any extra open curly braces

For much more details see this article: Regex Balancing Group in Depth

Kobi 2010-08-30 05:05:00

Before the question was updated I couldn't get this to work. On the other question we *validate* from start to end (`^(?:...)+$`), so the engine must try every combination. However, when you *match* for tokens, the engine can satisfy with less, and it's difficult setting priorities.

Kobi 2010-08-30 05:18:30

ansaurus

tags:

views:

answers:

Find matching bracket with Regex

related questions