views:

26

answers:

2

I'm having trouble with a regular expression in PHP that uses a potentially empty backreference. I was hoping that it would work as explained in http://www.regular-expressions.info/brackets.html:

If a backreference was not used in a particular match attempt (such as in the first example where the question mark made the first backreference optional), it is simply empty. Using an empty backreference in the regex is perfectly fine. It will simply be replaced with nothingness.

However it seems PHP is a bit different... from http://php.net/manual/en/regexp.reference.back-references.php:

If a subpattern has not actually been used in a particular match, then any back references to it always fail.

As a simplified example, I want to match the following two things with this regex:

  • {something} ... {/something}
  • {something:else} ... {/something:else}

Where "something" is known ahead of time, and "else" can be anything (or nothing).

so I tried the following regex ("else" hardcoded for simplicity):

preg_match("/\{(something(:else)?)\}(.*?)\{\/something\\2\}/is", $data, $matches)

Unfortunately if (:else)? doesn't match, the \2 backreference fails. If I make \2 optional (\2?), then I might match {something} ... {something:else}, which is no good.

Have I run into a limitation of regular expressions (the infamous "you need a parser, not a regex") or is this fixable?

Test program:

<?php
    $data = "{something} ... {/something}
             {something:else} ... {/something:else}
             {something:else} ... {/something}";

    // won't match {something} ... {/something}
    preg_match_all("/\{(something(:else)?)\}(.*?)\{\/something\\2\}/is", $data, $matches);
    print_r($matches);

    // change \\2 to \\2? and it matches too much
    preg_match_all("/\{(something(:else)?)\}(.*?)\{\/something\\2?\}/is", $data, $matches);
    print_r($matches);
?>
+2  A: 

Well, why not replace the ? with an or?

Change

"/\{(something(:else)?)\}(.*?)\{\/something\\2\}/is"

To

"/\{(something(:else|))\}(.*?)\{\/something\\2\}/is"

That way the reference will always be captured, but it will sometimes be empty (which is ok)...

ircmaxell
Aha! Knew I just needed another pair of eyes. It appears that will work.
Ty W
+1  A: 

why don't you simply use \1 instead of \2?

preg_match_all("/\{(something(:else)?)\}(.*?)\{\/\\1\}/is", $data, $matches);

as to "you need a parser" problem, you will / do need it to parse nested constructs.

stereofrog
ah yeah, that'd also probably work. thanks :)
Ty W
both answers solved the problem, this one seemed "cleaner" to me.
Ty W