ansaurus

Question

PCRE (recursive) pattern that matches a string containing a correctly parenthesized substring. Why does this one fail?

Answer 1

+5 A:

It doesn't fail

$ perl junk.pl
matched junk >(())<

$ cat junk.pl
my $junk = qr/
\A              # start of the string
(               # group 1
(?:             # group 2
[^()]*          # something other than parentheses (greedy)
|               # or
\( (?1) \)      # parenthesized group 1
)               # -group 2
+               # at least once (greedy)
)               # -group 1
\Z              # end of the string
/x;

if( "(())" =~ $junk ){
    print "matched junk >$1<\n";
}

junk 2010-06-04 12:02:31

Answer 2

+2 A:

Wow!.. Thank you, junk! It really works... in Perl. But not in PCRE. So, the question is mutating into "What's the difference between Perl and PCRE regex pattern matching?"

And voila! There is an answer:

Recursion difference from Perl

 In PCRE (like Python, but unlike Perl), a recursive subpattern call  is
 always treated as an atomic group. That is, once it has matched some of
 the subject string, it is never re-entered, even if it contains untried
 alternatives  and  there  is a subsequent matching failure.

Therefore, we just need to swap two subpatterns:

/ \A ( (?: \( (?1) \) | [^()]* )+ ) \Z /x

Thank you!

Anton N. Petrov 2010-06-05 23:09:19

ansaurus

tags:

views:

answers:

PCRE (recursive) pattern that matches a string containing a correctly parenthesized substring. Why does this one fail?

related questions