Whilst discussing the relative merits of using index()
in Perl to search for substrings I decided to write a micro benchmark to prove what I had seen before than index is faster than regular expressions when looking for a substring. Here is the benchmarking code:
use strict;
use warnings;
use Benchmark qw(:all);
my @random_data;
for (1..100000) {
push(@random_data, int(rand(1000)));
}
my $warn_about_counts = 0;
my $count = 100;
my $search = '99';
cmpthese($count, {
'Using regex' => sub {
my $instances = 0;
my $regex = qr/$search/;
foreach my $i (@random_data) {
$instances++ if $i =~ $regex;
}
warn $instances if $warn_about_counts;
return;
},
'Uncompiled regex with scalar' => sub {
my $instances = 0;
foreach my $i (@random_data) {
$instances++ if $i =~ /$search/;
}
warn $instances if $warn_about_counts;
return;
},
'Uncompiled regex with literal' => sub {
my $instances = 0;
foreach my $i (@random_data) {
$instances++ if $i =~ /99/;
}
warn $instances if $warn_about_counts;
return;
},
'Using index' => sub {
my $instances = 0;
foreach my $i (@random_data) {
$instances++ if index($i, $search) > -1;
}
warn $instances if $warn_about_counts;
return;
},
});
What I was surprised at was how these performed (using Perl 5.10.0 on a recent MacBook Pro). In descending order of speed:
- Uncompiled regex with literal (69.0 ops/sec)
- Using index (61.0 ops/sec)
- Uncompiled regex with scalar (56.8 ops/sec)
- Using regex (17.0 ops/sec)
Can anyone offer an explanation as to what voodoo Perl is using to get the speed of the two uncomplied regular expressions to perform as well as the index operation? Is it an issue in the data I've used to generate the benchmark (looking for the occurrence of 99 in 100,000 random integers) or is Perl able to do a runtime optimisation?