views:

305

answers:

7

What is the easiest way to get a key with the highest value from a hash in Perl?

A: 
my $highest_val = (sort {$b <=> $a} keys %hash)[0];
amphetamachine
That is not what he asked.
jrockway
@jrockway - What? Does this not return the key with the highest value?
amphetamachine
That returns the key that's the highest value. I assume he wants the key that maps to the highest value. Otherwise, the question is too simple to be asking :) (And in that case, why not just "reverse sort keys %hash"?)
jrockway
It depends what you mean by "value" here. Usually a hash is thought of as key/value pairs, so I'd assume the same thing as jrockway. But it could also mean what amphetamachine said. The questioner should clarify.
Kinopiko
@jrockway - `And in that case, why not just "reverse sort keys %hash"?` - Because that's a lexical sort, and `sort {$b <=> $a}` hits two birds with one stone in that it's both a numeric sort AND it's reversed.
amphetamachine
+3  A: 

The keys sorted by value, from lowest to highest:

sort { $hash{$a} <=> $hash{$b} } keys %hash

The keys sorted by value, from highest to lowest:

reverse sort { $hash{$a} <=> $hash{$b} } keys %hash

And the first element

(reverse sort { $hash{$a} <=> $hash{$b} } keys %hash)[0]

Replace the spaceship with cmp to taste.

jrockway
Why not just use `values` instead of `keys`?
Kinopiko
Because he wants the key, not the value. The value is what to sort by, the key is what to return. Unless I am misreading the question.
jrockway
Ah, OK, sorry, I missed that.
Kinopiko
-1 for using reverse when only wanting one element
Alnitak
use `$hash{$b} <=> $hash{$a}` instead of `reverse`
knittl
+1  A: 
my $highest_val = (sort { $hash{$a} <=> $hash{$b} } keys %hash)[0];

is likely to be what you want.

If you have a very large hash, you might want to use something like a Schwartzian transform:

my @array = map {[$hash{$_},$_]} keys %hash;
my $key_with_highest_value = (sort { $a->[0] <=> $b->[0] } @array)[0]->[1]
David M
This is more typing, but is O(n) instead of O(n log n), which is generally a good thing. If your list is big.
jrockway
The Schwartzian transform here only serves to reduce the number of hash table lookups, and does **not** change the complexity of the search - it's still O(n log n). The iterative approach from @jkasnicki is superior.
Alnitak
+5  A: 

The following is more space-efficient and will run in O(n) instead of O(n log n) as compared to the other answers which sort the hash. It assumes values are integers greater than 0 and the hash is not empty, but should be easily extended for your case.

my $key_for_max_value;
my $max_value = -1;
while ((my $key, my $value) = each %hash) {
  if ($value > $max_value) {
    $max_value = $value;
    $max_key = $key;
  }
}

$key_for_max_value will now be the key corresponding to the highest value.

jkasnicki
There is an assumption in your code that the values of the hash aren't all negative numbers less than -1. You should just make $max_value the value of the first thing seen or something.
Kinopiko
Nice to know _someone_ out there still appreciates efficiency over short-handedness. Good explanation, too.
amphetamachine
@Kinopiko: And that can be done with something like `my $max_value = undef;` and later, change the `if` to `if (! defined $max_value || $value > $max_value)`.
Robert P
@amphetamachine for reasonably sized data sets, this solution is very likely to be slower than the one using `sort`.
hobbs
@hobb how exactly do you make O(n log n) go faster than O(n) ?
Alnitak
@Alnitak by having a smaller constant factor. Let f(n) = n * log(n) / log(10) and g(n) = n * 1000000. f(n) = O(n log n) and g(n) = O(n). Now let n = 10. f(10) is ten, and g(10) is ten million. Furthermore, f(n) will be less than g(n) as long as n is less than ten to the millionth power. This despite the fact that f(n) dominates g(n).
hobbs
(It should be noted that since log n is considered a fairly slow-growing function, O(n) and O(n log n) are therefore "not very different", meaning it doesn't take a very large constant factor advantage for an O(n) function to beat out an O(n log n) one at small n.)
hobbs
@hobbs I don't think this solution will ever be slower than one involving sorting. Your argument is valid in general (constant factors can make O(n log n) preferable for small n), but in this case the constant factor on the O(n) solution is small: we look at each element exactly once and do a very small amount of computation with it. Finally, the real win of this solution is the space-savings. Sorting will take O(n) space, while this solution takes O(1) space. See @Eric Strom answer for another discussion and performance numbers.
jkasnicki
@jkasnicki - well put. Sure, there _might_ be particular cases where `O(n log n)` is smaller than `O(n)` (for small values of `n`), but this isn't one of them!
Alnitak
@jkasnicki : Put a short-circuit operator to define `$max_value` on the first pass: `$max_value ||= $value;`. That way you can get rid of the `-1` assumption
Zaid
+7  A: 

While the solution with sort:

(sort {$hash{$a} <=> $hash{$b}} keys %hash)[0]

found in some of the other answers is quite elegant, it doesn't perform as nicely as it looks. First off, the sort transforms an O(n) search search operation into an O(n log n) one. Secondly, the sort solution has n log n hash look-ups. Hash look-ups are very good for certain operations, but when working with the entire hash, look-ups will be slower than using each, keys, or values to iterate through the data structure. This is because the iterators do not need to calculate the hashes of keys, nor do they need to repeatedly walk through bins to find the values. And the overhead is not constant, but increasing as the hashes get larger.

Here are a few faster solutions:

use strict;
use warnings;

my %hash = (
    small   => 1,
    medium  => 5,
    largest => 10,
    large   => 8,
    tiny    => 0.1,
);

Here is a solution using the each iterator (an O(1) operation done n times):

sub largest_value (\%) {
    my $hash = shift;
    keys %$hash;       # reset the each iterator

    my ($large_key, $large_val) = each %$hash;

    while (my ($key, $val) = each %$hash) {
        if ($val > $large_val) {
            $large_val = $val;
            $large_key = $key;
        }
    }
    $large_key
}

print largest_value %hash; # prints 'largest'

Or a faster version that trades memory for speed (it makes a copy of the hash):

sub largest_value_mem (\%) {
    my $hash   = shift;
    my ($key, @keys) = keys   %$hash;
    my ($big, @vals) = values %$hash;

    for (0 .. $#keys) {
        if ($vals[$_] > $big) {
            $big = $vals[$_];
            $key = $keys[$_];
        }
    }
    $key
}

print largest_value_mem %hash; # prints 'largest'

Here is the performance with various hash sizes:

10 keys:              Rate largest_with_sort largest_value largest_value_mem
largest_with_sort 111565/s                --           -8%              -13%
largest_value     121743/s                9%            --               -5%
largest_value_mem 127783/s               15%            5%                --

50 keys:             Rate  largest_with_sort largest_value largest_value_mem
largest_with_sort 24912/s                 --          -37%              -40%
largest_value     39361/s                58%            --               -6%
largest_value_mem 41810/s                68%            6%                --

100 keys:            Rate  largest_with_sort largest_value largest_value_mem
largest_with_sort  9894/s                 --          -50%              -56%
largest_value     19680/s                99%            --              -12%
largest_value_mem 22371/s               126%           14%                --

1,000 keys:         Rate   largest_with_sort largest_value largest_value_mem
largest_with_sort  668/s                  --          -69%              -71%
largest_value     2183/s                227%            --               -7%
largest_value_mem 2341/s                250%            7%                --

10,000 keys:        Rate   largest_with_sort largest_value largest_value_mem
largest_with_sort 46.5/s                  --          -79%              -81%
largest_value      216/s                365%            --              -11%
largest_value_mem  242/s                421%           12%                --

As you can see, if memory isn't much of an issue, the version with internal arrays is fastest, closely followed by the each iterator, and in a distant third... sort

Eric Strom
+1 great and thorough answer!
Alnitak
Thorough answer. One comment though: the amortized complexity of a hash lookup is O(1), not O(log n).
jkasnicki
comparing real world speeds of hash lookup to array lookup still shows a nonlinear relationship. with 10 elements, an array is %50 faster than a hash, with 10000 elements it is %100 faster, with 1,000,000 elements it is 210% faster...
Eric Strom
+1  A: 
my ($max_key, $max_val) = each %hash or die "hash is empty";
while (my ($key, $val) = each %hash) {
  $max_key = $key, $max_val = $val if $val > $max_val;
}
salva
+1  A: 

Not sure why everyone is doing this by hand...

use List::Util qw( reduce );
my $max_val_key = reduce { $hash{$a} > $hash{$b} ? $a : $b } keys %hash;
Dave Sherohman