ansaurus

Question

Unexpected speed behaviour when benchmarking Perl regexs

Answer 1

+2 A:

Well, your case "Using regex" is so slow because you are compiling it each time. Try moving it out of the subroutine.

Chas. Owens 2010-10-08 09:08:49

But isn't the OP specifically trying to compare compilation speed? So pulling that out of the loop is not helpful here, it renders the benchmarking moot. :)

Ether 2010-10-08 16:45:46

@Ether But the is no point in using a compiled regex that way. Compiled regexes only have two benefits: they are compiled once and they are properly wrapped in non-capturing groups that maintain the options you gave them even when interpolated into other regexes (e.g. `my $any_foo = qr/foo/i; my $any_foo_lower_bar = qr/${any_foo}bar;`)

Chas. Owens 2010-10-08 19:56:11

Moving the regular expression outside of the subroutine makes no difference to the benchmark.

andeyatz 2010-10-13 13:09:45

Answer 2

+2 A:

Wholesale revision

In light of @Ven'Tatsu's comment, I changed the benchmark a bit:

use strict; use warnings;
use Benchmark qw(cmpthese);

use Data::Random qw( rand_words );
use Data::Random::WordList;

my $wl = Data::Random::WordList->new;

my @data_1 = (rand_words( size => 10000 )) x 10;
my @data_2 = @data_1;

my $pat = 'a(?=b)';
my $re = qr/^$pat/;

cmpthese(1, {
    'qr/$search/' => sub {
        my $instances = grep /$re/, @data_1;
        return;
    },
    'm/$search/' => sub {
        my $search = 'a(?=b)';
        my $instances = grep /^$search/, @data_2;
        return;
    },
});

On Windows XP with ActiveState perl 5.10.1:

              Rate qr/$search/  m/$search/
qr/$search/ 5.40/s          --        -73%
m/$search/  20.1/s        272%          --

On Windows XP with Strawberry perl 5.12.1:

              Rate qr/$search/  m/$search/
qr/$search/ 6.42/s          --        -66%
m/$search/  18.6/s        190%          --

On ArchLinux with bleadperl:

              Rate qr/$search/  m/$search/
qr/$search/ 9.25/s          --        -38%
m/$search/  14.8/s         60%          --

Sinan Ünür 2010-10-08 11:23:48

Regexp compilation got some attention in 5.10.x, under 5.8.8 `qr/$search/` and `m/$search/` are within 1% of each other. An additional issue is the complexity of the pattern. Changing the pattern to `9{2}` pushes `qr/$search/` just slightly ahead for me. Any more complex expression (like `9(?=9).`) and it becomes the clear winner over `m/$search/`.

Ven'Tatsu 2010-10-08 15:14:31

I'm curious if /o makes a difference at all, but not curious enough to actually try right now. :)

brian d foy 2010-10-08 21:12:21

Thanks for pointing out the `Data::Random::WordList` module @Sinan which would give me more realistic results. As to timings the point was to build a benchmark which would compare using index to detect a substring to using a regular expression for the same purpose. Therefore changing the regex is really shifting the goal-posts

andeyatz 2010-10-13 13:07:23

@andeyatz Check the previous versions of my post for timings of `qr` versus `index`. The peculiar thing to me is not any difference between `index` versus pattern matching. The counter-intuitive outcome of the first benchmarks was the fact that `qr/$var/` turned out to be much slower than `m/$var/` even though the compiled patters and the actual output of `re debug` seems identical to me.

Sinan Ünür 2010-10-13 14:06:21

@Sinan sorry for not noting that earlier on. This is identical to what I'm seeing then. When I turned on `re debug` it seemed to say that all regular expressions were equal but the micro-benchmark doesn't lie. It seems that there is a significant penalty for using a `qr//` regex over an inline one. Could this be caused by using a thread compiled version of Perl?

andeyatz 2010-10-14 15:42:11

Answer 3

+1 A:

Perl optimizes a lot of things. Your pattern with no special regex features and literal characters allows perl's regex engine to simplify many things. Using use re 'debug' can show you what's actually happening behind the scenes.

brian d foy 2010-10-08 18:24:31

I would expect matching against a precompiled `qr/$var/` to represent an upper bound on how fast `m/$var/` can be, but maybe my expectation is wrong.

Sinan Ünür 2010-10-08 19:30:27

All regexs compile in an identical manner according to the `re` pragma

andeyatz 2010-10-13 13:10:35

ansaurus

tags:

views:

answers:

Unexpected speed behaviour when benchmarking Perl regexs

Wholesale revision

related questions