tags:

views:

162

answers:

2

Well, there are other ways (hmmm... or rather working ways) to do it, but the question is why does this one fail?

/
\A              # start of the string
(               # group 1
(?:             # group 2
[^()]*          # something other than parentheses (greedy)
|               # or
\( (?1) \)      # parenthesized group 1
)               # -group 2
+               # at least once (greedy)
)               # -group 1
\Z              # end of the string
/x

Fails to match a string with nested parentheses: "(())"

+5  A: 

It doesn't fail

$ perl junk.pl
matched junk >(())<

$ cat junk.pl
my $junk = qr/
\A              # start of the string
(               # group 1
(?:             # group 2
[^()]*          # something other than parentheses (greedy)
|               # or
\( (?1) \)      # parenthesized group 1
)               # -group 2
+               # at least once (greedy)
)               # -group 1
\Z              # end of the string
/x;

if( "(())" =~ $junk ){
    print "matched junk >$1<\n";
}
junk
+2  A: 

Wow!.. Thank you, junk! It really works... in Perl. But not in PCRE. So, the question is mutating into "What's the difference between Perl and PCRE regex pattern matching?"

And voila! There is an answer:

Recursion difference from Perl

 In PCRE (like Python, but unlike Perl), a recursive subpattern call  is
 always treated as an atomic group. That is, once it has matched some of
 the subject string, it is never re-entered, even if it contains untried
 alternatives  and  there  is a subsequent matching failure.

Therefore, we just need to swap two subpatterns:

/ \A ( (?: \( (?1) \) | [^()]* )+ ) \Z /x

Thank you!

Anton N. Petrov