views:

787

answers:

12

In the question "Is returning a whole array from a Perl subroutine inefficient" two people recommend against optimizing if there is no need for it. As a general rule, optimizing can add complexity, and if it's not needed, simple is better. But in this specific case, returning an array versus an array ref, I don't see that there's any added complexity, and I think consistency in the interface design would be more important. Consequently, I almost always do something like:

sub foo
{
   my($result) = [];

   #....build up the result array ref

   $result;
}

Is there a reason I should not do this, even for small results?

A: 

I am not sure if returning a reference is more efficient in this case; i.e. does Perl copy data returned by subroutines?

In general, if your array is constructed entirely within the subroutine then there is no obvious problem with returning a reference because otherwise the array would be discarded anyway. However if the reference is also passed elsewhere before returning it, you may have two copies of the same reference and it may be modified in one place but not expected to elsewhere.

Tom Alsberg
Yes, perl copies data returned by subroutines.
ysth
You don't really have copies of references. References point to the same data, so if you change the data through one reference, you'll see the change when you access it through the other.
brian d foy
Yes, by copies of references, I meant of the reference itself (which can be changed to refer to something else), not of the data it refers to.
Tom Alsberg
+5  A: 

No. Except do "return $result;" for clarity.

I remember testing the efficiency of those, and the difference in performance was minimal for small arrays. For large arrays, returning a reference was way faster.

It's really a convenience thing for small result. Would you rather do this:

($foo,$bar) = barbaz();

Or returning a reference:

 $foobar = barbaz();
 $foobar->[0]; # $foo
 $foobar->[1]; # $bar

Another way to return a reference:

($foo,$bar) = @{barbaz()};

As a rule, once you decide which way to go, just keep to it for you module, since it makes it confusing to switch from one method to the next.

I typically return array references for lists of similar things, and an array when the response is composed of two to four different elements. More than that, I make a hash, since not all caller will care about all the response elements.

Mathieu Longtin
Voted up, but little nitpick: I would `($foo, $bar) = @{barbaz()}`
kmkaplan
A: 

When you are used to use code as first snippet in Mathieu Longtin answer you have to write ugly code as second snippet or this not so much better code:

my ($foo,$bar) = @{barbaz()};

I think this is the biggest drawback when returning reference instead of array. If I want return small amount of different kind values. I'm used to return array and assign directly to variables (as used to do in Python for example).

my ($status, $result) = do_something();
if ($status eq 'OK') {
    ...

If amount of values is bigger and various kind I'm used to return hash ref (better for refactoring)

my ($status, $data, $foo, $bar, $baz) =
    @{do_something()}{qw(status data foo bar baz)};
if ($status eq 'OK') {
    ...

If return values are of same kind, than returning of array or array ref is debatable depending of amount.

Hynek -Pichi- Vychodil
+11  A: 

You shouldn't return an array reference if it's inconsistent with the rest of your interface. If everything else that you work with returns lists instead of references, don't be the odd duck who causes other programmers to remember the exception.

Unless you have large lists, this is really a micro-optimization issue. You should be so lucky if this is the bottleneck in your program.

As far as complexity goes, the difference between a reference and a list is so far down on the complexity scale that you have bigger problems if your programmers are struggling with that. Complicated algorithms and workflows are complex, but this is just syntax.

brian d foy
That was pretty much my thinking, but I wanted to make sure that there wasn't a "Best Practices" answer that I didn't know about.
Joe Casadonte
+1  A: 

I don't think you should feel constrained to only using one or two methods. You should however keep it consistent for each module, or set of modules.

Here are some examples to ponder on:

sub test1{
  my @arr;
  return @arr;
}
sub test2{
  my @arr;
  return @arr if wantarray;
  return \@arr;
}
sub test3{
  my %hash;
  return %hash;
}
sub test4{
  my %hash;
  return %hash if wantarray;
  return \%hash;
}
sub test5{
  my %hash;
  return $hash{ qw'one two three' } if wantarray;
  return \%hash;
}
{
  package test;
  use Devel::Caller qw'called_as_method';
  sub test6{
    my $out;
    if( wantarray ){
      $out = 'list';
    }else{
      $out = 'scalar';
    }
    $out = "call in $out context";
    if( called_as_method ){
      $out = "method $out";
    }else{
      $out = "simple function $out";
    }
    return $out;
  }
}

I can see possibly using many of these in future project, but some of them are rather pointless.

Brad Gilbert
I could get even stranger, by using http://search.cpan.org/perldoc?Devel::Callsite.
Brad Gilbert
A: 

Returning an array gives some nice benefits:

my @foo = get_array(); # Get list and assign to array.
my $foo = get_array(); # Get magnitude of list.
my ($f1, $f2) = get_array(); # Get first two members of list.
my ($f3,$f6) = (get_array())[3,6]; # Get specific members of the list.

sub get_array {
   my @array = 0..9;

   return @array;
}

If you return array refs, you'll have to write several subs to do the same work. Also, an empty array returns false in a boolean context, but an empty array ref does not.

if ( get_array() ) {
    do_stuff();
}

If you return array refs, then you have to do:

if ( @{ get_array_ref() } ) {
    do_stuff();
}

Except if get_array_ref() fails to return a ref, say instead and undef value, you have a program halting crash. One of the following will help:

if ( @{ get_array() || [] } ) {
    do_stuff();
}

if ( eval{ @{get_array()} } ) {
    do_stuff();
}

So if the speed benefits are needed or if you need an array ref (perhaps you want to allow direct manipulation of an object's collection element--yuck, but sometimes it must happen), return an array ref. Otherwise, I find the benefits of standard arrays worth preserving.

Update: It is really important to remember that what you return from a routine is not always an array or a list. What you return is whatever follows the return, or the result of the last operation. Your return value will be evaluated in context. Most of the time, everything will be fine, but sometimes you can get unexpected behavior.

sub foo {
    return $_[0]..$_[1];
}

my $a = foo(9,20);
my @a = foo(9,20);

print "$a\n";
print "@a\n";

Compare with:

sub foo {
    my @foo = ($_[0]..$_[1]);
    return @foo;
}

my $a = foo(9,20);
my @a = foo(9,20);

print "$a\n";
print "@a\n";

So, when you say "return an array" be sure you really mean "return an array". Be aware of what you return from your routines.

daotoad
A: 

Is there a reason I should not do this, even for small results?

There's not a perl-specific reason, meaning it's correct and efficient to return a reference to the local array. The only downside is that people who call your function have to deal with the returned array ref, and access elements with the arrow -> or dereference etc. So, it's slightly more troublesome for the caller.

+1  A: 

If the array is constructed inside the function there is no reason to return the array; just return a reference, since the caller is guaranteed that there will only be one copy of it (it was just created).

If the function is considering a set of global arrays and returning one of them, then it's acceptable to return a reference if the caller will not modify it. If the caller might modify the array, and this is not desired, then the function should return a copy.

This really is a uniquely Perl problem. In Java you always return a reference, and the function prevent the array from being modified (if that is your goal) by finalizing both the array and the data that it contains. In python references are returned and there is no way to prevent them from being modified; if that's important, a reference to a copy is returned instead.

vy32
The issue isn't unique to Perl; C can return a value or a reference. I'm sure there are plenty of other languages that offer similar choices in function call/return conventions. Java and Python are OOP languages--objects are passed as references, so it makes sense for them to pass by reference.
daotoad
A: 

I just want to the idea about clumsy syntax of handling an array reference as opposed to a list. As brian mentioned, you really shouldn't do it, if the rest of the system is using lists. It's an unneeded optimization in most cases.

However, if that is not the case, and you are free to create your own style, then one thing that can make the coding less smelly is using autobox. autobox turns SCALAR, ARRAY and HASH (as well as others) into "packages", such that you can code:

my ( $name, $number ) = $obj->get_arrayref()->items( 0, 1 );

instead of the slightly more clumsy:

my ( $name, $number ) = @{ $obj->get_arrayref() };

by coding something like this:

sub ARRAY::slice { 
    my $arr_ref = shift;
    my $length  = @$arr_ref;
    my @subs    = map { abs($_) < $length ? $_ : $_ < 0 ? 0 : $#$arr_ref } @_;
    given ( scalar @subs ) { 
        when ( 0 ) { return $arr_ref; }
        when ( 2 ) { return [ @{$arr_ref}[ $subs[0]..$subs[1] ] ]; }
        default    { return [ @{$arr_ref}[ @subs ] ]; }
    }
    return $arr_ref; # should not get here.
}

sub ARRAY::items { return @{ &ARRAY::slice }; }

Keep in mind that autobox requires you to implement all the behaviors you want from these. $arr_ref->pop() doesn't work until you define sub ARRAY::pop unless you use autobox::Core

Axeman
+1  A: 

I'll copy the relevant portion of my answer from the other question here.

The oft overlooked second consideration is the interface. How is the returned array going to be used? This is important because whole array dereferencing is kinda awful in Perl. For example:

for my $info (@{ getInfo($some, $args) }) {
    ...
}

That's ugly. This is much better.

for my $info ( getInfo($some, $args) ) {
    ...
}

It also lends itself to mapping and grepping.

my @info = grep { ... } getInfo($some, $args);

But returning an array ref can be handy if you're going to pick out individual elements:

my $address = getInfo($some, $args)->[2];

That's simpler than:

my $address = (getInfo($some, $args))[2];

Or:

my @info = getInfo($some, $args);
my $address = $info[2];

But at that point, you should question whether @info is truly a list or a hash.

my $address = getInfo($some, $args)->{address};

Unlike arrays vs array refs, there's little reason to choose to return a hash over a hash ref. Hash refs allow handy short-hand, like the code above. And opposite of arrays vs refs, it makes the iterator case simpler, or at least avoids a middle-man variable.

for my $key (keys %{some_func_that_returns_a_hash_ref}) {
    ...
}

What you should not do is have getInfo() return an array ref in scalar context and an array in list context. This muddles the traditional use of scalar context as array length which will surprise the user.

I would like to add that while making everything consistently do X is a good rule of thumb, it is not of paramount importance in designing a good interface. Go a bit too far with it and you can easily steamroll other more important concerns.

Finally, I will plug my own module, Method::Signatures, because it offers a compromise for passing in array references without having to use the array ref syntax.

use Method::Signatures;

method foo(\@args) {
    print "@args";      # @args is not a copy
    push @args, 42;   # this alters the caller array
}

my @nums = (1,2,3);
Class->foo(\@nums);   # prints 1 2 3
print "@nums";        # prints 1 2 3 42

This is done through the magic of Data::Alias.

Schwern
+1  A: 

An important omission in the above answers: don't return references to private data!

For example:

package MyClass;

sub new {
  my($class) = @_;
  bless { _things => [] } => $class;
}

sub add_things {
  my $self = shift;
  push @{ $self->{_things} } => @_;
}

sub things {
  my($self) = @_;
  $self->{_things};  # NO!
}

Yes, users can peek directly under the hood with Perl objects implemented this way, but don't make it easy for users to unwittingly shoot themselves in the foot, e.g.,

my $obj = MyClass->new;
$obj->add_things(1 .. 3);

...;

my $things = $obj->things;
my $first = shift @$things;

Better would be to return a (perhaps deep) copy of your private data, as in

sub things {
  my($self) = @_;
  @{ $self->{_things} };
}
Greg Bacon
A: 

Since nobody mentioned about wantarray, I will :-)

I consider a good practice to let the caller decide what context it wants the result. For instance, in the code below, you ask perl for the context the subroutine was called and decide what to return.

sub get_things {
    my @things;
    ... # populate things
    return wantarray ? @things : \@things;
}

Then

for my $thing ( get_things() ) {
    ...
}

and

my @things = get_things();

works properly because of the list context, and:

my $things = get_things();

will return the array's reference.

For more info about wantarray you might want to check perldoc -f wantarray.

Edit: I over-sighted one of the first answers, which mentioned wantarray, but I think this is answer is still valid because it makes it a bit clearer.

Igor