I'm having trouble with a regular expression in PHP that uses a potentially empty backreference. I was hoping that it would work as explained in http://www.regular-expressions.info/brackets.html:
If a backreference was not used in a particular match attempt (such as in the first example where the question mark made the first backreference optional), it is simply empty. Using an empty backreference in the regex is perfectly fine. It will simply be replaced with nothingness.
However it seems PHP is a bit different... from http://php.net/manual/en/regexp.reference.back-references.php:
If a subpattern has not actually been used in a particular match, then any back references to it always fail.
As a simplified example, I want to match the following two things with this regex:
- {something} ... {/something}
- {something:else} ... {/something:else}
Where "something" is known ahead of time, and "else" can be anything (or nothing).
so I tried the following regex ("else" hardcoded for simplicity):
preg_match("/\{(something(:else)?)\}(.*?)\{\/something\\2\}/is", $data, $matches)
Unfortunately if (:else)? doesn't match, the \2 backreference fails. If I make \2 optional (\2?), then I might match {something} ... {something:else}, which is no good.
Have I run into a limitation of regular expressions (the infamous "you need a parser, not a regex") or is this fixable?
Test program:
<?php
$data = "{something} ... {/something}
{something:else} ... {/something:else}
{something:else} ... {/something}";
// won't match {something} ... {/something}
preg_match_all("/\{(something(:else)?)\}(.*?)\{\/something\\2\}/is", $data, $matches);
print_r($matches);
// change \\2 to \\2? and it matches too much
preg_match_all("/\{(something(:else)?)\}(.*?)\{\/something\\2?\}/is", $data, $matches);
print_r($matches);
?>