tags:

views:

263

answers:

4

I have some code that at one part will get executed a lot, and I'm wondernig which way is a more efficient implementation. I will use a for loop to simulate the part the gets executed alot:

option A:

my %sections = (
    'somestring1' => 1,
    'somestring2' => 1,
    'somestring3' => 1,
    'somestring4' => 1
);

for (0..10000)
{
    # $element is chosen at random
    $namespace = $element if $sections{$element};
}

option B:

for (0..10000)
{
    # $element is chosen at random
    $namespace = $element if ($element eq'somestring1' || 
                            $element eq'somestring2' ||
                            $element eq'somestring3' ||
                            $element eq'somestring4');
}

Can anyone benchmark this or know the answer as I am not familiar with benchmarking tools.

This code probably doesn't make sense in this context but it is in fact what I need to use.

+4  A: 

The hash lookup mechanism is considerably faster.

chaos
+5  A: 

My guess is that the first version, with exists is going to be faster, not to mention more readable and maintainable.

for (0..10000)
{
    # $element is chosen at random
    $namespace = $element if exists $sections{$element};
}

Merely checking for the existence of a hash key is quicker than retrieving its value for comparison, so use exists.

Adam Bellaire
+3  A: 

This is probably the time to get familiar with benchmarking tools, like the CPAN module Benchmark.

Ether
This is true, however if the hash becomes very large, I can't imagine advocating inlining the values into a giant boolean expression for performance reasons.
Adam Bellaire
You're right, which means I misunderstood the question.. I thought the OP was wanting to check if the key was in a *subset* of all keys in the hash, rather than in the hash itself. Fixing response. :)
Ether
+14  A: 

Use the cmpthese function from the Benchmark module

use strict;
use warnings;
use Benchmark qw'cmpthese';

my %sections = (
    somestring1 => 1,
    somestring2 => 1,
    somestring3 => 1,
    somestring4 => 1
);

my @elements = map { 'somestring' . int(1 + rand(10)) } 1 .. 100;

my $namespace;

cmpthese(100000, {
    hash_value => sub {
        foreach my $element (@elements) {
            $namespace = $element if $sections{$element};
        }
    },
    hash_exists => sub {
        foreach my $element (@elements) {
            $namespace = $element if exists $sections{$element};
        }
    },
    string_cmp => sub {
        foreach my $element (@elements) {
            $namespace = $element if (
                $element eq'somestring1' ||
                $element eq'somestring2' ||
                $element eq'somestring3' ||
                $element eq'somestring4');
        }
    },
});

My results (running Perl 5.10 on WinXP):

               Rate  string_cmp  hash_value hash_exists
string_cmp  18932/s          --        -44%        -50%
hash_value  33512/s         77%          --        -12%
hash_exists 38095/s        101%         14%          --

So a hash lookup is 77% faster than the cascaded string comparisons and checking for hash existence instead of value (as Adam Bellaire suggested) is 14% faster still.

Michael Carman
I have a bad feeling, but I'd be interested to see how testing against the regex `/^somestring[1234]$/` would compare to those three methods. I expect it not to be as fast as `exists` but I'd like to see how it stacks up against the string comparison.
Chris Lutz
I show `/^somestring[1234]$/` as about 3% slower than the string comparisons, though real-world performance would depend heavily on the number of keys and the pattern (if any) in their values.
Michael Carman
<3 Michael Carman... give a man a fish... teach a man to fish... etc
mikegrb