tags:

views:

581

answers:

2

.NET balanced group regexes make my head explode. I've got this string i'm trying to match:

other stuff blah blah....
                    {
                        stuff stuff
                        {key:
                            stuff
                            stuff
                        }
                    } more stuff.....

Here's my regex:

[^{}]*                      # anything that isn't { }
\{                          # starting with {
(?>                         # atomic group: throw away backtracks on exit
    [^{}]+
  |
    \{(?:\w+:)?(?<DEPTH>)   # on matching { plus optional "word:" push stack  
  |
    \}(?<-DEPTH>)           # on matching } pop stack
)*
(?(DEPTH)(?!))              # if depth unbalanced, fail match
\}                          # ending with }
[^{}]*                      # anything that isn't { }

So, I'm trying to match balancing curly braces, where some of the opening curly braces have an optional word followed by a colon. The above regex matches my example string, but if i remove a curly brace, (i.e. "unbalancing it):

    other stuff blah blah....
                    {
                        stuff stuff
                        {key:
                            stuff
                            stuff

                    } more stuff.....

...it still matches!

Can anyone tell me how to fix my regex?

A: 

Not to be a kill joy but what you're attempting to do with a regular expression is boggling your mind because it's simply not possible. Regular expressions are a class of finite automata and do not posses enough state in which to peform recursive/nested matching. You'll need a context free grammar of sorts in order to get this working.

There are some regular expression engines which do support a notion of recursion. These are not strictly speaking regular expressions though. Can you tell us what engine you are using because it's possible it has a recursion function which will help out this scenario.

JaredPar
Subject line says .NET Regex. I already know the engine supports it. See http://blog.stevenlevithan.com/archives/balancing-groups
Matt
+4  A: 

Have you checked what it is matching in the second case? Since you don't have any anchors, I think the engine starts the match just after the first '{'. From there, until the end, the string matches.

Try surrounding the pattern with \A and \z.

MizardX