I came to know PHP after Perl, so when I first found preg_* function I basically just used those. Later I read that str_replace()
is faster when dealing with literal text. So my question is, can't preg_replace()
be as efficient as str_replace()
when the search pattern does not use special characters? Maybe just analyzing the pattern to choose between regex and plain text algorithms?
views:
270answers:
8preg is good for complicated replacements to text (text hyperlink to the actual link). the other is for like changing words (like a word filter).
if you've got blanks that can be matched by a pattern, use preg otherwise attempt with str_replace.
although trying to do the same in preg with str_replace, is actually slower if you're doing complicated stuff.
joe
In theory yes, you're right. It is possible the PHP team could jigger preg_replace
to analyze the pattern being passed in and then use the code for str_replace
if it didn't see any meta-characters. Assuming the analysis wasn't too heavy, this might yield better performance results.
However, the way the PHP source code (that is, the code used to implement PHP) is organized doesn't lend itself well to this sharing. PHP is (in some ways) less a full language and more a collection of modules.
So, initially the PHP group chose to stay away from this kind of cross module pollination. At this point, changing the preg_replace
function to do that kind of analysis would risk breaking a lot of code, and the performance improvements would be minuscule.
Finally, the analysis itself is a harder problem to solve than you'd think. Tell me, does this pattern
'/123/'
mean I should search for the literal text
123
or the literal text
/123/
It's easy to come up with compelling arguments for either interpretation, which introduces an additional level of confusion into using the function.
An interesting idea in theory, but in practice and the context of the PHP universe, it creates far more problems than it solves.
Maybe just analyzing the pattern to choose between regex and plain text algorithms?
I'd rather not be forced to escape everything that has special meaning in regular expressions every time I just want to replace some substrings.
I guess the differences in speed yield to the overhead the regex parser/engine adds in comparison to how str_* operates. But I'm just guessing here. In case of doubt, benchmark and see if it can be faster or same speed :)
There is a lengthy and detailed article about Regular Expression Matching Speed and Wikipedia has some info about Implementations and Running times and a Comparison of Regular Expression engines.
Despite similarities, both functions are quite different, thus not interchangeable. For example, replacement in preg_replace can contain backreferences to text captured by regular expression:
preg_replace ('/(\w+) apple/', '$1 pear', 'A red apple'); // => 'A red pear'
this is how it works in javascript
alert("a.b".replace(".", "X")) // aXb
alert("a.b".replace(/./, "X")) // X.b
that is, one function can accept both substrings and special regexp literals. Regexp literals are extremely handy and the whole string library can be made smaller and more flexible (think of one single split
instead of "explode" and "preg_split", pos
instead of "strpos" and "preg_match" etc).
that being said, i highly doubt regexp literals can be added to php any time soon.
Maybe just analyzing the pattern to choose between regex and plain text algorithms?
This alone would reduce performance. Also, preg_*()
functions use a library that isn't necessary for simpler string operations.
It is not possible to replace str_replace()
with preg_replace()
because the function could not understand if I am trying to use a pattern matching, or a normal string replacement. It would be possible to do that if the new function would accept a new parameter, but in this case you would introduce an incompatibility issue for old code.
Changing preg_replace()
to make it understand that it should make a string replacement would not make it optimized. It should check the string passed as argument, and understand that I am requesting to replace a string with another one; checking for that would take time that could be used to resolve the pattern matching.