views:

520

answers:

9

Hi I was wondering what the most effective way of accomplishing the below would be. ( i know they accomplish the same thing but was wondering how most people would do this between the three, and WHY )

a.pl

my %hash = build_hash();
# do stuff with hash using $hash{$key}
sub build_hash
{    # build some hash
     my %hash = ();
     my @k = qw(hi bi no th xc ul 8e r);
     for ( @k )
     {
         $hash{$k} = 1;
     }
     # RETURNS A COPY OF HASH?
     return %hash;
}

b.pl

my $hashref = build_hash();
# do stuff with hash using $hashref->{$key}
sub build_hash
{    # build some hash
     my %hash = ();
     my @k = qw(hi bi no th xc ul 8e r);
     for ( @k )
     {
         $hash{$k} = 1;
     }
     # just return reference (smaller than making copy?)
     return \%hash;
}

c.pl

my %hash = %{build_hash()};
# do stuff with hash using $hash{$key}
# better because now we dont have to dreference our hashref each time using ->?

sub build_hash
{    # build some hash
     my %hash = ();
     my @k = qw(hi bi no th xc ul 8e r);
     for ( @k )
     {
         $hash{$k} = 1;
     }
     return \%hash;
}
+2  A: 

See related SO question perl return hashes from functions, best practice (it appears in the Related list at the right of this page too...).

martin clayton
+8  A: 

I would return the reference to save the processing time of flattening the hash into a list of scalars, building the new hash and (possibly) garbage collecting the local hash in the subroutine.

David Harris
+5  A: 

What you're looking for is a hash slice:

# assigns the value 1 to every element of the hash

my %hash;                                   # declare an empty hash
my @list = qw(hi bi no th xc ul 8e r);      # declare the keys as a list
@hash{@list} =                              # for every key listed in @list,
                (1) x @list;                # ...assign to it the corresponding value in this list
                                            # which is (1, 1, 1, 1, 1...)  (@list in scalar context
                                            #   gives the number of elements in the list)

The x operator is described at perldoc perlop.

See perldoc perldsc and perldoc perlreftut for tutorials on data structures and references (both must-reads for beginners and experts alike). Hash slices themselves are mentioned in perldoc perldata.

Regarding returning a hash from a function, normally you should return the hash itself, not a reference. You could use a reference if the hash is huge and memory or time is a concern, but that shouldn't be your first worry -- getting the code working is.

Return values from functions are always lists (where returning a scalar is essentially a list of one element). Hashes are lists in Perl: You can assign one to the other interchangably (assuming the list has an even number of elements and there are no key collisions which would result in some values being lost during the conversion):

use strict; use warnings;
use Data::Dumper;

function foo
{
    return qw(key1 value1 key2 value2);
}

my @list = foo();
my %hash = foo();

print Dumper(\@list);
print Dumper(\%hash);

gives:

$VAR1 = [
          'key1',
          'value1',
          'key2',
          'value2'
        ];

$VAR1 = {
          'key2' => 'value2',
          'key1' => 'value1'
        };

PS. I highly recommend writing up small sample programs like the one above to play around with data structures and to see what happens. You can learn a lot by experimenting!

Ether
In the codebase I currently work with, that would be a simple function called `hashof` exported by the ProjectNamespace::DataManip function and approximately implemented like `sub hashof { return map { $_ => 1 } @_;}` (with some prototype sugar and the like). Our `hash_slice_of($hashref, @list)` on the other hand returns each key-value pair which `exists` in $hashref where the key is also in @list. As a hash-manipulation function, they all return hashes (even-sized lists) so that the return values are easier to work with and pass to each other.
fennec
+1  A: 

Take a care: a.pl returns a list with an even number of elements, not a hash. When you then assign such a list to a hash variable, the hash will be built with the elements at the even indices as keys and the elements at the odd indices as values. [EDIT: That was how I always saw the matter, but sub { ... %hash } actually behaves a bit differently than sub { ... @list }. ]

For the same reason, building a hash, the way you describe, is as simple as:

my %hash = map { $_ => 1 } qw(hi bi no th xc ul 8e r);

My personal rule of thumb is to avoid references unless I really need them (e.g. nested structures, or when you really need to pass around a reference to the same thing).

EDIT: (I can't click the "add comment" link anymore?! Using mousekeys here...) I thought about it a little and I think passing around hash refs is probably better after all, due to the way we use a hash. The paragraph above still holds for array refs though.

Thanks for your comments Schwern and Ether.

Inshallah
+1 to the use of map to build a hash from a list, -1 to the idea of avoiding references.
Schwern
+1  A: 

a.pl and c.pl require a copy of the hash to be taken (and the hash internal to the function is marked as free memory). b.pl, on the other hand, builds the hash just once and requires little extra memory to return a reference upon which you can operate. Thus b.pl is more likely to be the most efficient form of the three, both in space and time.

PP
+13  A: 

I prefer returning a hash ref for two reasons. One, it uses a bit less memory since there's no copy. Two, it lets you do this if you just need one piece of the hash.

my $value = build_hash()->{$key};

Learn to love hash references, you're going to be seeing them a lot once you start using objects.

Schwern
would you like some eggs with that? (an implicit hash reference :)
ysth
+2  A: 

I'm going to go against the grain and what everyone else is saying, and say that I prefer to have my data returned as a hash (well, as an even-sized list which is likely to be interpreted as a hash). I work in an environment where we tend to do things like the following code snippet, and it's much easier to combine and sort and slice and dice when you don't have to dereference every other line. (It's also nice to know that someone can't damage your hashref because you passed the entire thing by value. edit: unless you've got references to other objects/hashes/arrays in the hash-values, then you're in trouble anyway).

my %filtered_config_slice = 
   hashgrep { $a !~ /^apparent_/ && defined $b } (
   map { $_->build_config_slice(%some_params, some_other => 'param') } 
   ($self->partial_config_strategies, $other_config_strategy)
);

This approximates something that my code might do: building a configuration for an object based on various configuration strategy objects (some of which the object knows about inherently, plus some extra guy) and then filters out some of them as irrelevant.

(Yes, we have nice tools like hashgrep and hashmap and lkeys that do useful things to hashes. $a and $b get set to the key and the value of each item in the list, respectively). (Yes, we have people who can program at this level. Hiring is obnoxious, but we have a quality product.)

If you don't intend to do anything like this resembling functional programming, or if you need more performance (have you profiled?) then sure, use hashrefs.

fennec
The problem is the memory use, not necessarily the speed. When you are cavalier with memory use, the footprint of your application tends to explode. See Schwern's answer.
brian d foy
+1, that's definitely a valid way to look at it.
Inshallah
+5  A: 

Why not return both? Context is a very powerful feature in Perl to allow your functions to "do what you mean". Often the decision of which is a better return value depends on how the calling code plans to use the value, which is exactly why Perl has the builtin wantarray.

sub build_hash {
    my %hash;
    @hash{@keys} = (1) x @keys;
    wantarray ? %hash : \%hash
}

my %hash = build_hash;  # list context, a list of (key => value) pairs
my $href = build_hash;  # scalar context, a hash reference
Eric Strom
`my($href) = build_hash(); # whoops` A little too much help for not enough win.
Schwern
@Schwern => If someone working in Perl can't recognize that the lvalue is imposing list context on the assignment they aren't going to get very far. Rather than hiding these features from people, those who understand them should help others use them properly.
Eric Strom
@Eric Its not about not understanding contexts. Its whether you expect build_hash() to return something different in list context. Don't trust that the user studies and forever remembers the documentation of every function. Its compounded in that `my($foo)` and `my $foo` are fairly easy to casually interchange. Also `function( build_hash() )`. Oops. Subtle bugs easily missed. Finally, retuning a hash doesn't have the utility of returning a list. A hash must be stuck into a variable to be useful. Returning a list can be used implicitly, LISP style. So I question the value.
Schwern
+2  A: 

Regarding returning a hash from a function, normally you should return the hash itself, not a reference. You could use a reference if the hash is huge and memory or time is a concern, but that shouldn't be your first worry -- getting the code working is.

I'm going to have to disagree with Ether here. There was a time when I took that position, but quickly found myself descending into a hell of having to remember which subs returned hashes and which returned hashrefs, which was a rather serious impediment to just getting the code working. It's important to standardize on either always returning a hash/array or always returning a hashref/arrayref unless you want to be constantly tripping over yourself.

As for which to standardize on, I see several advantages to going with references:

  • When you return a hash or array, what you're actually returning is a list containing a flattened copy of the original hash/array. Just like passing in hash/array parameters to a sub, this has the disadvantage that you can only send one list at a time. Granted, you don't often need to return multiple lists of values, but it does happen, so why choose to standardize on doing things in a way which precludes it?

  • The (usually negligible) performance/memory benefits of returning a single scalar rather than a potentially much larger chunk of data.

  • It maintains consistency with OO code, which frequently passes objects (i.e., blessed references) back and forth.

  • If, for whatever reason, it's important that you have a fresh copy of the hash/array rather than a reference to the original, the calling code can easily make one, as the OP demonstrated in c.pl. If you return a copy of the hash, though, there's no way for the caller to turn that into a reference to the original. (In cases where this is advantageous, the function can make a copy and return a reference to the copy, thus protecting the original while also avoiding the "this returns hashes, that returns hashrefs" hell I mentioned earlier.)

  • As Schwern mentioned, it's real nice to be able to do my $foo = $obj->some_data->{key}.

The only advantage I can see to always returning hashes/arrays is that it is easier for those who don't understand references or aren't comfortable working with them. Given that comfort with references takes a matter of weeks or months to develop, followed by years or decades of working with them fluently, I don't consider this a meaningful benefit.

Dave Sherohman
+1, maintaining consistency is a good point. However, since I use arrays more often as arrays than I use hashes as hashes, and since there would be a lot more dereferencing/safe-copying going on for array refs, I think they should be treated differently.
Inshallah