views:

569

answers:

7

In the example module below, the getters and setters are generated by adding anonymous subroutines to the symbol table. After the methods have been created in this manner, will the resulting code be functionally equivalent (in terms of behavior, speed, etc.) to a module with manually-written getters and setters, or does this approach have some sort of inherent liability? (I have done some basic speed benchmarking and have not detected any differences so far.)

package Module;    
use strict;
use warnings;

BEGIN {
    my @attr = qw(author title number);
    no strict 'refs';
    for my $a (@attr){
        *{__PACKAGE__ . "::get_$a"} = sub { $_[0]->{$a}         };
        *{__PACKAGE__ . "::set_$a"} = sub { $_[0]->{$a} = $_[1] };
    }
}

sub new {
    my $class = shift;
    bless { @_ }, $class;
}

1;
+7  A: 

There is no difference because:

sub Some_package::foo { ... }

is just a shorthand for:

BEGIN { *Some_package::foo = sub { ... } }

Reference from perlmod

/I3az/

draegtun
+2  A: 

Both approaches have the result of installing a subroutine reference into the symbol table at compile time. The behavior and runtime performance will be exactly the same. There might be a very small (i.e. negligible) difference in compile time.

A similar approach is to generate accessors on demand via AUTOLOAD, which does have a small impact at runtime. Using the AUTOLOAD approach can also change the behavior of things like $object->can().

Obviously, generating methods will hide them from any form of static analysis, including tools like ctags.

Michael Carman
Seriously. Don't use AUTOLOAD for such things. It opens the flood gates to insanity. (I'm pretty sure you, MC, know that :)
tsee
**`WARNING:`** Don't use `AUTOLOAD` unless you know **exactly** what you are doing.
Brad Gilbert
+2  A: 

The runtime behaviour and performance should be the pretty much the same (unless you do something that cares whether the methods are closures or not).

With large numbers of attributes, there will be a difference in compile time and memory use...both in favor of the generated getters and setters, not the manually-written ones. Try, for instance, this:

BEGIN {
    no strict 'refs';
    for my $a ("aaaa".."zzzz"){
        *{__PACKAGE__ . "::get_$a"} = sub { $_[0]->{$a}         };
        *{__PACKAGE__ . "::set_$a"} = sub { $_[0]->{$a} = $_[1] };
    }
}
print `ps -F -p $$`;  # adjust for your ps syntax

compared to

sub get_aaaa { $_[0]->{aaaa}         }
sub set_aaaa { $_[0]->{aaaa} = $_[1] }
sub get_aaab { $_[0]->{aaab}         }
...
sub set_zzzy { $_[0]->{zzzy} = $_[1] }
sub get_zzzz { $_[0]->{zzzz}         }
sub set_zzzz { $_[0]->{zzzz} = $_[1] }
print `ps -F -p $$`;  # adjust for your ps syntax
ysth
+8  A: 

There should be no difference in runtime performance if the resulting code is the same in both cases. This is usually not possible, however, unless you use string eval to create your subroutines. For example, the code you provided:

... = sub { $_[0]->{$a} };

will be ever-so-slightly slower than the code you would have written manually:

sub foo { $_[0]->{'foo'} }

simply because the former has to get the value of the variable $a before using it as a key to the hash, whereas the later uses a constant as its hash key. Also, as an aside, shift usually tends to be faster than $_[0]. Here's some benchmark code:

use Benchmark qw(cmpthese);

package Foo;

sub manual_shift { shift->{'foo'} }
sub manual_index { $_[0]->{'foo'} }

my $attr = 'foo';

*dynamic_shift = sub { shift->{$attr} };
*dynamic_index = sub { $_[0]->{$attr} };

package main;

my $o = bless { foo => 123 }, 'Foo';

cmpthese(-2, {
  manual_shift  => sub { my $a = $o->manual_shift },
  manual_index  => sub { my $a = $o->manual_index },
  dynamic_shift => sub { my $a = $o->dynamic_shift },
  dynamic_index => sub { my $a = $o->dynamic_index },
});

and the results on my system:

                   Rate dynamic_index  manual_index dynamic_shift  manual_shift
dynamic_index 1799024/s            --           -3%           -4%           -7%
manual_index  1853616/s            3%            --           -1%           -4%
dynamic_shift 1873183/s            4%            1%            --           -3%
manual_shift  1937019/s            8%            4%            3%            --

They're so close that differences may get lost in the noise, but over many trials I think you'll see that the "manual shift" variant is the fastest. But as with all microbenchmarks like this, you have to test your exact scenario on your hardware and your version of perl to be sure of anything.

And here's string eval thrown into the mix.

eval "sub eval_index { \$_[0]->{'$attr'} }";
eval "sub eval_shift { shift->{'$attr'} }";

It should be exactly the same as the "manual" variants, plus or minus the statistical noise. My results:

                   Rate dynamic_index manual_index dynamic_shift manual_shift eval_shift eval_index
dynamic_index 1820444/s            --          -1%           -2%          -3%        -4%        -5%
manual_index  1835005/s            1%           --           -1%          -2%        -3%        -4%
dynamic_shift 1858131/s            2%           1%            --          -1%        -2%        -3%
manual_shift  1876708/s            3%           2%            1%           --        -1%        -2%
eval_shift    1894132/s            4%           3%            2%           1%         --        -1%
eval_index    1914060/s            5%           4%            3%           2%         1%         --

Again, these are all so close that you'd have to take great pains and perform many trials to sort out the signal from the noise. But the difference between using a constant as a hash key and using a variable (whose value must first be retrieved) as a hash key should show through. (The shift optimization is a separate issue and is more likely to change one way or the other in past or future versions of perl.)

John Siracusa
your eval_index needs the $ in $_[0] escaped.
ysth
If you're going to generate accessors, you could just as well make use of a FAST generator. Cf. http://search.cpan.org/dist/Class-XSAccessor That will be faster than anything but direct hash access.
tsee
ysth: Thanks, I think the backslash got eaten while editing locally. I've corrected it.
John Siracusa
Add this to the benchmark " `sub eval_alias { my($arg)=@_; $arg->{'$attr'} }` "
Brad Gilbert
Brad Gilbert: I tried it and it's the slowest by far: eval_alias 1329967/s
John Siracusa
Wouldn't $a be inlined into each sub definition as a closure? $a shouldn't exist at runtime; it would now be a constant, therefore making the sub identical to the manually-written form.
Ether
+2  A: 

The only difference is start-up time. For simple code generation schemes the difference will be hard to measure. For more complex systems, it can add up.

A great example of this in action is Moose. Moose does all kinds of amazing code generation for you, but it has a significant impact on start-up times. This is enough of an issue that the Moose devs are working on a scheme to cache generated code in pmc files and load them instead of regenerating code every time.

Also consider something like Class::Struct. It does code generation using string eval (last time I checked). Even so, because it is very simple, it does not cause a significant slow-down at start-up.

daotoad
+5  A: 

The main drawback of well-generated accessors is that they defeat tools that rely on static analysis. Your IDE's method auto-completion for example. If this is part of a big project, I heartily recommend you take a look at Moose. It's accessor generation done right (and much more). It is popular enough that support is being added to IDEs, so that the aforementioned issue will disappear in due time.

There are many accessor generators on CPAN that are easy to use and generate moderately efficient code. If performance is an issue, then -- provided you stick to using accessor methods -- you can't get any faster than Class::XSAccessor since it uses highly optimized C/XS code for the accessors.

Rolling your own accessor-generating code is the worst of all options. It defeats the static analysis forever, is likely rather hard to read, and potentially introduces new bugs.

tsee
+2  A: 

Aside from the excellent points the others have mentioned, I'd also like to add the main disadvantage I found: they all show up as anonymous subroutines in the profiler. For whatever reason, Devel::DProf just doesn't know how to figure out the name.

Now I would hope that the newer Devel::NYTProf may do a better job — but I haven't tried it.

Dominic Mitchell