When using preg_replace() in PHP with strings generated at runtime, one can protect special regex characters (such as '$' or '+') in the search string by using preg_quote(). But what's the correct way to handle this in the replacement string? Take this code for example:
<?php
$haystack = '...a bit of sample text...';
$replacement = '\\HELLO WORLD$1.+-';
$replacement_quoted = preg_quote($replacement);
var_dump('--replacement', $replacement, '--replacement_quoted',
$replacement_quoted, '--haystack', $haystack);
$result1 = preg_replace("@(bit) (of) (sample)@is", "\${1}" . $replacement ."$3", $haystack);
$result2 = preg_replace("@(bit) (of) (sample)@is", "\${1}" . $replacement_quoted ."$3", $haystack);
$replacement_new1 = str_replace('$', '\$', $replacement);
$replacement_new2 = str_replace('\\', '\\\\', $replacement_new1);
$result3 = preg_replace("@(bit) (of) (sample)@is", "\${1}" . $replacement_new1 ."$3", $haystack);
$result4 = preg_replace("@(bit) (of) (sample)@is", "\${1}" . $replacement_new2 ."$3", $haystack);
var_dump('--result1 (not quoted)', $result1, '--result2 (quoted)', $result2,
'--result3 ($ escaped)', $result3, '--result4 (\ and $ escaped)', $result3);
?>
Here's the output:
string(13) "--replacement"
string(17) "\HELLO WORLD$1.+-"
string(20) "--replacement_quoted"
string(22) "\\HELLO WORLD\$1\.\+\-"
string(10) "--haystack"
string(26) "...a bit of sample text..."
string(22) "--result1 (not quoted)"
string(40) "...a bit\HELLO WORLDbit.+-sample text..."
string(18) "--result2 (quoted)"
string(42) "...a bit\HELLO WORLD$1\.\+\-sample text..."
string(21) "--result3 ($ escaped)"
string(39) "...a bit\HELLO WORLD$1.+-sample text..."
string(27) "--result4 (\ and $ escaped)"
string(39) "...a bit\HELLO WORLD$1.+-sample text..."
As you can see, you can't win with preg_quote(). If you don't call it and just pass the string in unmodified (result1), anything that looks like a capture token ($1 above) gets replaced with whatever the corresponding capture group contained. If you do call it (result2), you have no problems with the capture groups, but any other special PCRE characters (such as *) get escaped as well, and the escaped characters manage to live on in the output. Also interesting to me is that both versions produce a single \ in the output.
Only by manually quoting characters, in particular, the $, can you get this to work. This can be seen in result3 and result4. Continuing the oddness with the \, however, both result3, which adds escaping for \, and result4 again produce a single \ in the output. Adding six \ characters at the beginning of the replacement string produces just two \ in the final output for result1, result3, and result4, and three of them for result2.
So, it would seem that most issues are taken care of by manually escaping the $ character. It seems like the \ character also needs to be escaped, but I need to think about that one some more to figure exactly out what's happing. In any case, this is all quite ugly - between the annoying \${1} syntax and having to manually escape certain characters, the code just smells really rotten and error-prone. Is there something I'm missing? Is there a clean way to do this?