ansaurus

Question

How do I sort an array of hash references by one of the hash values?

Answer 1

+3 A:

I like your proposed solution:

my %sevs = (critical => 0, high => 1, ...);
my @sorted = sort { $sevs{$a->{bug_severity}} <=> $sevs{$b->{bug_severity}} } @unsorted

tster 2009-10-28 22:09:57

Thanks, tster; it's nice to hear that I'm on the right path, and it's useful to see it expressed differently.

Allan Anderson 2009-10-28 22:39:59

The other solutions are very educational; I think that this simple one is the best choice for my needs.

Allan Anderson 2009-10-29 00:24:05

Answer 2

+5 A:

To avoid calling getVal more times than necessary, you can use "decorate, sort, undecorate". Decorate is getting the information you actually care about for the sort:

my @decorated = map { [ $_, getVal($_) ] } @unsorted;

Then sort the decorated list:

my @sortedDecorate = sort { $a->[1] <=> $b->[1] } @decorated;

Then get the original information back (undecorate):

my @sorted = map { $_->[0] } @sortedDecorate;

Or the more Perl-ish way to do it:

@sorted = map { $_->[0] }
          sort { $a->[1] <=> $b->[1] }
          map { [ $_, getVal($_) ] } @unsorted;

jamessan 2009-10-28 22:13:53

And interesting idea. I like. (but don't do the last way, perl is already hard enough to understand!)

tster 2009-10-28 22:17:17

@tster "Don't do it the efficient way because I do not understand it" is what you are saying. This sorting method is well known and explained in the article I referenced. BTW, the language is Perl.

Sinan Ünür 2009-10-28 22:43:23

This is indeed the Schwartzian Transform. Named for me, but not by me.

Randal Schwartz 2009-10-28 22:58:45

I remember you mentioning that when teaching a perl class I was in, Randal. It still intrigues me that the community latched onto that term instead of the general decorate-sort-undecorate. :)

jamessan 2009-10-28 23:08:04

@Sinan, I'm not saying don't do this. I'm saying don't make it a one-liner because it makes it hard to understand. The one liner is not more efficient than breaking it into 3 easy to understand and read lines.

tster 2009-10-29 00:48:57

Answer 3

+3 A:

You can use the Schwartzian Transform:

my @sorted = map  { $_->[1] }
             sort { $a->[0] <=> $b->[0] }
             map  { [ $lookup{$_->{bug_severity}, $_ ] } 
             @unsorted;

Explanation:

map  { [ $lookup{$_->{bug_severity}, $_ ] } @unsorted;

maps each bug to an array reference whose first element is the numerical bug severity from the lookup table. Using the Schwartzian Transform, you look up the value only once for each bug in @unsorted.

Then,

sort { $a->[0] <=> $b->[0] }

sorts that array by the first element. Finally,

@sorted = map  { $_->[1] }

pulls out the original bugs from the array returned by sort.

There is really no need for getval when all it's doing is a hash lookup.

For automatically generating efficient sorters, CPAN module Sort::Maker is excellent:

use strict; use warnings;

use Sort::Maker;

my @bugs = (
    { name => 'bar', bug_severity => 'severe' },
    { name => 'baz', bug_severity => 'noncritical' },
    { name => 'foo', bug_severity => 'critical' },
);

my $sorter = make_sorter('ST',
    name      => 'severity_sorter',
    init_code => 'my %lookup = (
                     critical => 0,
                     severe => 1,
                     noncritical => -1 );',
    number    => [ code => '$lookup{$_->{bug_severity}}' ],
);

use Data::Dumper;
print Dumper $_ for severity_sorter( @bugs );

Output:

$VAR1 = {
          'name' => 'baz',
          'bug_severity' => 'noncritical'
        };
$VAR1 = {
          'name' => 'foo',
          'bug_severity' => 'critical'
        };
$VAR1 = {
          'name' => 'bar',
          'bug_severity' => 'severe'
        };

Note that the number of lookups that need to be made when using the naive method depends on the number of elements in @unsorted. We can count them using the simple program:

#!/usr/bin/perl

use strict;
use warnings;

my ($n_elements) = @ARGV;

my @keys = qw(a b c);
my %lookup = map { $keys[$_-1] => $_ } 1 .. @keys;

my @unsorted = map { $keys[rand 3] } 1 .. $n_elements;

my $n_lookups;

my @sorted = sort {
    $n_lookups += 2;
    $lookup{$a} <=> $lookup{$b}
} @unsorted;

print "It took $n_lookups lookups to sort $n_elements elements\n";

Output:

C:\Temp> tzt 10
It took 38 lookups to sort 10 elements

C:\Temp> tzt 100
It took 978 lookups to sort 100 elements

C:\Temp> tzt 1000
It took 10916 lookups to sort 1000 elements

C:\Temp> tzt 10000
It took 113000 lookups to sort 10000 elements

Therefore, one would need more information to decide whether the naive sort or using the Schwartzian Transform is the appropriate solution.

And here is a simple benchmark which seems to agree with @Ether's argument:

#!/usr/bin/perl

use strict;
use warnings;

use Benchmark qw( cmpthese );

my ($n_elements) = @ARGV;

my @keys = qw(foo bar baz);
my %lookup = map { $keys[$_] => $_ } 0 .. $#keys;

my @unsorted = map { {v => $keys[rand 3]} } 1 .. $n_elements;

cmpthese(-1, {
    naive => sub {
        my @sorted = sort {
            $lookup{$a->{v}} <=> $lookup{$b->{v}}
        } @unsorted;
    },
    schwartzian => sub {
        my @sorted = map  { $_->[1] }
                     sort { $a->[0] <=> $b->[0] }
                     map  { [$lookup{$_->{v}}, $_] }
                     @unsorted;
    }
});

Output:

C:\Temp> tzt 10
               Rate schwartzian       naive
schwartzian 18842/s          --        -29%
naive       26357/s         40%          --

C:\Temp> tzt 100
              Rate       naive schwartzian
naive       1365/s          --        -11%
schwartzian 1532/s         12%          --

C:\Temp> tzt 1000
             Rate       naive schwartzian
naive       121/s          --        -11%
schwartzian 135/s         12%          --

Sinan Ünür 2009-10-28 22:15:11

Jamessan already posted this, and it is almost impossible to understand without requiring major effort.

tster 2009-10-28 22:21:58

@tster So what? Sometimes others post answers while I am typing. I voted his answer up the moment I noticed it.

Sinan Ünür 2009-10-28 22:24:41

Another well-explicated example :) thank you for the details. I've got lots to experiment with here.

Allan Anderson 2009-10-28 22:44:08

Answer 4

A:

You could use a lookup table to determine the ordering of the bugzilla severities, like this (using sample data to illustrate):

use strict; use warnings;
use Data::Dumper;

my @bugInfo = (
                { id => 1,
                  assignee => 'Bob',
                  severity => 'HIGH'
                },
                { id => 2,
                  assignee => 'Anna',
                  severity => 'LOW'
                },
                { id => 3,
                  assignee => 'Carl',
                  severity => 'EXTREME'
                },
              );
my %severity_ordering = (
    EXTREME => 0,
    HIGH => 1,
    MEDIUM => 2,
    LOW => 3,
);
sub byseverity
{
    $severity_ordering{$a->{severity}} <=> $severity_ordering{$b->{severity}}
}

my @sortedBugs = sort byseverity @bugInfo;
print Dumper(\@sortedBugs);

yields:

$VAR1 = [
          {
            'assignee' => 'Carl',
            'id' => 3,
            'severity' => 'EXTREME'
          },
          {
            'assignee' => 'Bob',
            'id' => 1,
            'severity' => 'HIGH'
          },
          {
            'assignee' => 'Anna',
            'id' => 2,
            'severity' => 'LOW'
          }
        ];

Ether 2009-10-28 22:25:35

...which is pretty much what you posted in your question (doh, didn't read it closely enough), as well as what tster said. So yes, I agree it's the best solution. :)

Ether 2009-10-28 22:29:42

I appreciate having my fuzzy idea spelled out for me; thank you for the detailed example that saves me some "argh, how does Perl do that again?" time.

Allan Anderson 2009-10-28 22:38:21

@Ether the general problem with this approach is that a lot more lookups are made than necessary. http://www.stonehenge.com/merlyn/UnixReview/col64.html

Sinan Ünür 2009-10-28 22:39:34

There's a lookup for every entry in the table being sorted, but 1. the lookup table is very short (only ~5 entries for the bugzilla example), and 2. in a Schwartzian transform you have to process every entry in the input data multiple times, which in this scenario is going to be of roughly equivalent expense. Unless I have missed something, a transform only pays off if the input data is relatively small compared to the table being used to determine sort order, and you also have to factor in the complexity of the code (simpler code is easier to debug than complex code).

Ether 2009-10-28 22:45:47

@Ether: What matters is how many bugs there are in `@unsorted` not how many values are in the lookup table.

Sinan Ünür 2009-10-28 22:57:14

@Sinan, ok fair enough. However, the real gains from using a transform are if the computation costs of executing the sort method n log(n) times exceeds the time required for the two map calls. The example in the UnixReview article involved a split; in the OP's example the data is already nicely parsable as a clean data structure. So I think in this case I think the transform won't buy us anything, even for a really huge list of bugs to process.

Ether 2009-10-28 23:07:43

ansaurus

tags:

views:

answers:

How do I sort an array of hash references by one of the hash values?

related questions