tags:

views:

205

answers:

6

Hi Everyone,

I have a list of strings whose values come from a fixed set. I need to sort this list in an arbitrary order.

The order of the set is specified by another list of all possible strings, sorted in order in an array.

Here is an example:

my @all_possible_strings_in_order = ('name', 'street', 'city','state', 'postalcode');

my @list_that_needs_to_be_sorted = ('city', 'state', 'name');

I am working in perl. I figure my best bet is to automatically create a hash that associates strings with ordinal numbers, and then sort by reference to those ordinal numbers.

There are about 300 possible strings in the set. Typical lists will have 30 strings that need to be sorted. This isn't going to be called in a tight loop, but it can't be slow either. Automatically building the ordinal hash can't be done ahead of time due to the structure of the program.

I'm open for suggestions on better ways to do this. Thanks!

Edit: You guys are awesome. I can't hold my head up any more tonight, but tomorrow morning I'm going to take the time to really understand your suggestions... It's about time I became proficient with map() and grep().

+6  A: 

Set up the association between strings and their respective positions with

my @all_possible_strings_in_order = qw/ name street city state postalcode /;

my %order = map +($order[$_] => $_), 0 .. $#all_possible_strings_in_order;

Now you can create a comparison function for use with Perl's sort operator:

sub by_order { $order{$a} <=> $order{$b} }

For example:

my @sorted = sort by_order qw/ city state name /;
print "@sorted\n";
# prints: name city state
Greg Bacon
`my %order; @order{@all_possible_strings_in_order} = 1 .. @all_possible_strings_in_order` -- little bit less evil / more readable? :)
hobbs
I prefer the `map` version (though with curlies), but in another question I profiled the two and found the slice assignment to be slightly faster (with a constant value repeated).
Chris Lutz
(Also, in my version, I believe we need the `map` version to initialize the `state` variable in one line. Not certain about that, though.)
Chris Lutz
Yes, that's nice, but I have a weird style hangup about separate declarations and initializations. Is the map really so bad? Too much Haskell for me?
Greg Bacon
@gbacon: then `@$_{@all}=1..@all for \my %order;`
ysth
Hash slices always trump anything else, just like waffles and ponies!
Ether
%order = ( '' => 4 ) due to missing array values in the unary assignment. should be: +($allpossible_strings_in_order[$_] => $_)
zen
+1  A: 

Here's an idea that's fairly simple.

Take your first string from your unsorted list, search for it in your master list, find its index in the master list, then place it in a list, and keep track of the index.

Take your second string, find its index in the master list. if that index is greater than the first, place it in your new list behind the first, otherwise in front.

Keep this up for all remaining strings, maintaining a list of all the indices, so that you always know where to place the next string re the strings already sorted.

Hope this is clear enough to help.

John Doner

John R Doner
+2  A: 

If you have Perl 5.10, you could use this (names shortened for clarity):

use feature 'state';

sub bylist {
  state %hash = map { $all_possible[$_] => $_ } 0 .. $#all_possible;
  $hash{$_[0]} cmp $hash{$_[1]};
}

my @sorted = sort bylist @list_to_sort;

The state keyword creates what in C is known as a static variable - it's local to the bylist subroutine, but it won't be reinitialized. That way, you don't have to set anything up beforehand, but don't have to recalculate the value every time you want to use it.

I believe there's a hack to get this to happen in older Perls, but I wouldn't use it. If you don't have 5.10, just use gbacon's idea that he shamelessly stole from my brain while I was typing this :P

Chris Lutz
Muwhahahahahaha!
Greg Bacon
The "hack" -- which you should never, ever do, and no longer works in Perl 5.10 -- is `my %hash if 0; %hash or %hash = ...`
ephemient
The non-hack work-around is to enclose the sub declaration in a block and define the state variable there. e.g. `{ my $x; sub foo { $x ||= 1; ... } }`
Michael Carman
Or even `{my %hash = ...; sub bylist { ... }}` though that will pre-calculate even if `bylist` is never entered. If you have to use pre-5.10 Perl, this is most recommended, and still works in 5.10. On the other hand, it's probably not the "hack" Chris was thinking of :)
ephemient
@ephemient: `{my %hash = ...; sub bylist {} }` only initializes if program flow reaches the block, which means you would have to put the sub definition in the middle of your program logic (and before calling it) rather than at the end as is normal.
Michael Carman
@Michael - You could make that block a `BEGIN {}` block, so that it's guaranteed to be initialized. (Though I tend to put my subroutine definitions at the beginning. Personal style.)
Chris Lutz
This really seems like half of an answer.
Brad Gilbert
+1  A: 

The most naive way to do this would be to sort based on a comparison function, where the comparison function comp(a,b) = "which of a and b comes first in the master list?", if I understand correctly.

So yeah, your idea looks about right. If you have to do a lot of sorts between changes to @all_possible_strings_in_order, then you should build the whole map once. If the order list changes every sort, you may be able to gain some speed with some clever lazy search, but maybe not.


my %order;
my $i = 0;
foreach my $s (@all_possible_strings_in_order) {
    $order{$s} = $i++;
}

my @sorted = sort {$order{$a} <=> $order{$b}} @list_that_needs_to_be_sorted;

I imagine this should be quite fast.

Peter Cordes
Your Perl needs a little work. `foreach my $s in (@list)` ? What's that `in` mean?
Chris Lutz
is it normal to get 4 answers in the span of 3 minutes, after no answers for 45 minutes? I feel like a perl noob for not using map or something with a range operator to generate the order hash.
Peter Cordes
@Chris: It means I've been using bash all afternoon.
Peter Cordes
No, but then again it's not really normal to get Perl questions on Stack Overflow these days. (And `map` and `..` aren't always the best answers. The only way for the OP to figure out what is best in this case is to profile it.)
Chris Lutz
@Chris Lutz: ?? there are Perl questions on SO almost every single day. Not the flood that some languages get, and not the ~20/day that perlmonks gets, but a steady trickle.
ysth
It's a very small trickle compared to Python, even though there's a lot more Perl code and programmers than Python. (Also, I conveniently happen to miss most of them.)
Chris Lutz
+5  A: 

A different approach (one that won't work if the list to be sorted can have duplicates that need to be preserved):

my %set;
@set{ @list_that_needs_to_be_sorted } = ();
my @sorted = grep exists $set{$_}, @all_possible_strings_in_order;
ysth
+2  A: 

You can just go over the master list and push any element that occurs in the unsorted list onto the result list, while removing it from the unsorted list. If your unsorted list is short (from your example, I reckon about 5 elements), this should be faster and smaller than building hash tables each time (you said you couldn't do it once beforehand).

An optimization might be to make a trie from the unsorted list, but whether this is better is dependent on the size of each list.

Svante
That's the same as ysth's answer.
Peter Cordes