ansaurus

Question

Should Perl hashes always contain values?

Answer 1

+4 A:

You can't create a hash key without a value. The value can be undef but it will be there. How else would you construct a hash. Or was your question regarding whether the value can be undef? In which case I would say that the value you store there (undef, 1, 0...) is entirely up to you. If a lot of folks are using it then you probably want to store some true value though incase some one else uses if ($hash{$extension}) {...} instead of exists because they weren't paying attention.

Jeremy Wall 2009-05-27 19:17:37

You can certainly create a hash key without a value. That's why that nifty defined() operator exists. :)

brian d foy 2009-05-27 21:39:38

perl -le '@hash{ qw(a b c ) } = (); print keys %hash'

brian d foy 2009-05-27 21:42:04

Answer 2

+1 A:

undef is a value.

Of course, stuff like that is always depndent on what you are currently doing. But $foo{bar} is just a variable like $bar and I don't see any reason why either one should not be undef every now and then.

PS: That's why exists exists.

innaM 2009-05-27 19:23:12

Actually, undef is the absence of value. It's what you get before you assign anything.

brian d foy 2009-05-27 21:38:23

@brian d foy: I can understand your position but I tend to agree with Manni's perspective. An undefined value is still a value and Perl's undefined is unusually well-defined. :P

Michael Carman 2009-05-27 23:45:21

Answer 3

+4 A:

It's usually preferable to set a defined value for every key. The idiomatic value (when you don't care about the value) is 1.

my %hash = map { $_ => 1 } @array;

Doing it this way makes the code the uses the hash slightly simpler because you can use $hash{key} as a Boolean value. If the value can be undefined you need to use the more verbose exists $hash{key} instead.

That said, there are situations where a value of undef is desirable. For example: imagine that you're parsing C header files to extract preprocessor symbols. It would be logical to store these in a hash of name => value pairs.

#define FOO 1
#define BAR

In Perl, this would map to:

my %symbols = ( FOO => 1, BAR => undef);

In C a #define defines a symbol, not a value -- "defined" in C is mapped to "exists" in Perl.

Michael Carman 2009-05-27 20:06:46

The problem with just testing `$hash{$key}` is that if `$key` does not exist in the hash, you just modified the hash. The bugs that can result from this are epic.

daotoad 2009-05-29 17:36:31

@daotoad: Modified how? Autovivification doesn't occur for simple retrieval (but would if you did something like $hash{$key}{subkey}).

Michael Carman 2009-05-29 18:03:44

Answer 4

+3 A:

As others have said, the idiomatic solution for a hashset (a hash that only contains keys, not values) is to use 1 as the value because this makes the testing for existence easy. However, there is something to be said for using undef as the value. It will force the users to test for existence with exists which is marginally faster. Of course, you could test for existence with exists even when the value is 1 and avoid the inevitable bugs from users who forget to use exists.

Chas. Owens 2009-05-27 21:44:26

I use exists() because I think of this as a set operation rather than a dictionary lookup.

brian d foy 2009-05-27 22:24:55

Exactly, it is just a better way of doing it (faster and clearer), but the problem comes from people who are used to the idiom where they can check the value. That is why I say the safest thing is to use 1 as the value, but use exists.

Chas. Owens 2009-05-27 23:23:59

Using exists() because it's faster is almost certainly a case of improper optimization. Whether or not it's clearer is debatable; not everyone has brian's clarity of thought and attention to nuance. That said, +1 for the combination of precise expression (using exists for testing) but tolerant coding (using 1 for values).

Michael Carman 2009-05-28 00:02:01

@Michael Carman When choosing between two options that seem roughly equal in value determining which one is faster as a tie breaker seems reasonable. I wouldn't suggest benchmarking every idiom and using the faster one. For instance, I often prefer to use a map over a for loop for initializing arrays. I have benchmarked it before and the for loop with a push is slightly faster, but the clarity and conciseness of the map outweighs the minor (and possibly version dependent) performance gain.

Chas. Owens 2009-05-28 02:40:34

Given all that has been said, is a value of 1 better than some other value? For instance if the purpose of the hash is to determine whether something should be included, does it make sense to use a value of "include" or does that not accomplish anything that a value of 1 can't except waste memory?

Dr. Faust 2009-05-28 16:33:37

Dr. Faust You could easily use "include", but since the value is never used (except in the case where someone forgets to use exists) it would mean more typing for little reason.

Chas. Owens 2009-05-28 18:02:02

@Dr. Faustus: Any true value would work but by using 1 it's clearer that all you really care about is Boolean truth since that's the normal value for "true." If you use anything other than 0 or 1 it implies that the value is something other than a Boolean.

Michael Carman 2009-05-29 03:00:54

That's a very point Michael - Thanks.

Dr. Faust 2009-05-29 13:13:26

Don't set your values to 1. The problem with using 1 is that you silently add keys to the hash when you test for a value. If this is used in a code base that also tests existence you have just created one hell of an intermittent run-time bug. Use undef and test for existence. Consider locking your hashes.

daotoad 2009-05-29 17:40:23

Yes. My application is just testing the existence of keys. So your are saying I should have their values set to undef.

Dr. Faust 2009-05-29 19:44:40

@daotoad: Hash retrieval does not trigger autovivification. It's using the value (e.g. by dereferencing or incrementing it) that causes it to autovivify. Running perl -e "%h = qw(a 1 b 2); print $h{x}; print keys %h" prints "ab"

Michael Carman 2009-05-29 21:25:52

@daotoad Autovivification is a separate issue, if (exists $foo{bar}{baz}) {} autovivifies a bar key in %foo even though you are using exists.

Chas. Owens 2009-05-29 21:57:39

@Fred Zeppelin No, you should use 1 as the values, but you should also use the exists function rather than trusting the value to be true.

Chas. Owens 2009-05-29 21:59:03

Answer 5

+1 A:

Storing '1' in a Set-hash Considered Harmful

I know using Considered Harmful is considered harmful, but this is bad; almost as bad as unrestrained goto usage.

Ok, I've harped on this in a few comments, but I think I need a full response to demonstrate the issue.

Lets say we have a daemon process that provides back-end inventory control for a shop that sells widgets.

my @items = qw(
    widget
    thingy
    whozit
    whatsit
);

my @items_in_stock = qw(
    widget
    thingy
);

my %in_stock;
my @in_stock(@items_in_stock) = (1) x @items_in_stock;  #initialize all keys to 1

sub Process_Request {
    my $request = shift;

    if( $request eq REORDER ) {
        Reorder_Items(\@items, \%in_stock);
    }
    else { 
        Error_Response( ILLEGAL_REQUEST );
    }
}

sub Reorder_Items{
   my $items = shift;
   my $in_stock =  shift;

   # Order items we do not have in-stock.
   for my $item ( @$items ) {

       Reorder_Item( $item ) 
           if not exists $in_stock->{$item};
   }

}

The tool is great, it automatically keeps items in stock. Very nice. Now, the boss asks for automatically generated catalogs of in-stock items. So we modify Process_Request() and add catalog generation.

sub Process_Request {
    my $request = shift;

    if( $request eq REORDER ) {
        Reorder_Items(\@items, \%in_stock);
    }
    if( $request eq CATALOG ) {
        Build_Catalog(\@items, \%in_stock);
    }
    else { 
        Error_Response( ILLEGAL_REQUEST );
    }
}

sub Build_Catalog {
    my $items = shift;
    my $in_stock = shift;

    my $catalog_response = '';
    foreach my $item ( @$items ) {
        $catalog_response .= Catalog_Item($item)
            if $in_stock->{$item};
    }

    return $catalog_response;
}

In testing Build_Catalog() works fine. Hooray, we go live with the app.

Oops. For some reason nothing is being ordered, the company runs out of stock of everything.

The Build_Catalog() routine adds keys to the %in_stock, so Reorder_Items() now sees everything as in stock and never makes an order.

Using Hash::Util's lock_hash can help prevent accidental hash modification. If we locked the %in_stock before calling 'Build_Catalog()` we would have gotten a fatal error and would never have gone live with the bug.

In summary, it is best to test existence of keys rather than truth of your set-hash values. If you are using existence as a signifier, don't set your values to '1' because that will mask bugs and make them harder to track. Using lock_hash can help catch these problems.

If you must check for the truth of the values, do so in every case.

daotoad 2009-05-29 18:31:16

Answer 6

+1 A:

Using undef as a value in hash is more memory efficient than storing 1.

2009-05-29 21:55:22

ansaurus

tags:

views:

answers:

Should Perl hashes always contain values?

Storing '1' in a Set-hash Considered Harmful

related questions