




When looking at the accepted answer of stripping out all characters from a string, leaving numbers, the author added a + after the expression

$str = preg_replace('/[^0-9.]+/', '', $str);

in order to find sub-strings, instead of single occurrences, to remove. For the functionality the + is optional. But I started to wonder whether adding the + is faster or not. (Or is there not any difference?)

I would assume it is faster, due to less string and memory handling. But I could also understand that more complex regex expressions are slower than simple ones.

So when using this technique to remove sub-strings should one try to find large or small sub-strings?


I haven't done any test but with the + you match more characters so the replace process should be executed less times. If you don't write the + in the regexp the replacement is done on every character instead of replace an entire substring, so i think it's slower.

That is what I thought, but how about the regex? Is it (much) slower when the + is added? And if so how does it compare to higher replacement speed due to less amtches?
I don't know this but i think that match more characters in the same match doesn't affect too much the performance of the regexp.
+1  A: 

Don't read too much into benchmark results. They're incredibly hard to do well. Really, the only thing you should take from this is that the repetition might be faster on certain types of strings, where the span of repetition is long.

This type of stuff that can easily change with a different version of PCRE.

function tst($pat, $str) {
    $start = microtime(true);
    preg_replace($pat, '', $str);
    return microtime(true) - $start;
$strs = array(
    'letters' => str_repeat("a", 20000),
    'numbers' => str_repeat("1", 20000),
    'mostly_letters' => str_repeat("aaaaaaaaaaaaa5", 20000),
    'mostly_numbers' => str_repeat("5555555555555a", 20000)
$pats = array(
    'rep' => '/[^0-9.]+/',
    'norep' => '/[^0-9.]/'

//precompile patterns(php caches them per script) and warm up microtime
preg_replace($pats['rep'], '', 'foo');
preg_replace($pats['norep'], '', 'foo');

foreach ($strs as $strname => $str) {
    echo "$strname\n";
    foreach ($pats as $patname => $pat) {
        printf("%10s    %.5f\n", $patname, tst($pat, $str));

Oh nice I will certainly test it soon (when I get the chance). Maybe add str_replace(array("0",...,"9","."), '', $str) in the mix as well... :)

I ran some speeds tests as chris suggested. Compared to his code I:

  • added a str_replace for comparison:
$str_replace_array = array('0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '.');

function tst($pat, $str) {
    global $str_replace_array;
    $start = microtime(true);
    if($pat == '')
        str_replace($str_replace_array, '', $str);
        preg_replace($pat, '', $str);
    return microtime(true) - $start;
  • made all strings the same length, so the results could be compared better

The results in:

         rep    0.00298
       norep    0.06953
 str_replace    0.00406

         rep    0.02867
       norep    0.02612
 str_replace    0.01242

         rep    0.00931
       norep    0.06649
 str_replace    0.00593

         rep    0.03285
       norep    0.02942
 str_replace    0.01359

It shows that the repeating regex (with the + added) is much faster when replacing larger blocks (less memory handling?) But no repeating regex is slightly faster when not much needs to be replaced.

Furthermore, str_replace is basically always faster (twice the speed) than the regex replacement, except when a regex matches the complete string.
