ansaurus

Question

How can I sort an array so that certain file extensions sort to the top?

Answer 1

+2 A:

Sort takes an optional block as first argument, though in this case a Schwartzian transform would be quicker.

@files = map { $_->[0] } sort { $a->[1] <=> $b->[1] } map { [ $_, !/\.txt$/ ] } @files;

Leon Timmermans 2010-03-26 17:42:35

In my tests, I find the Schwartzian Transform to be a bit slower (but only a bit). In my answer, I have to make two passes over the array, but in your example so do you. You also have to make references though.

brian d foy 2010-03-26 17:54:00

First time I'm hearing about the Schwartzian Transform. It's definitely interesting. Say I wanted the txt files first, then rpm, then the rest of the files. How would the code above need to be changed? I'm not sure I understand what *exactly* it's doing.

rarbox 2010-03-26 18:08:44

@rarbox: see http://stackoverflow.com/questions/594257/when-are-schwartzian-transforms-useful

Ether 2010-03-26 18:13:05

Answer 2

+4 A:

You just need to add a sort in front of each of your greps:

 my @sorted =
   (
   sort( grep /\.txt\z/,   @files ),
   sort( grep ! /\.txt\z/, @files )
   );

The trick here is that you are partitioning the list then sorting each partition independently. Depending on what you are doing, this might be a lot better than trying to do everything in one sort operation. Conversely, it might not always be better.

There are various other ways to get this done, but they aren't this simple. :)

Here's a quick benchmark on my MacBook Air with vanilla Perl 5.10.1:

There are 600 files to sort
     brian:  3 wallclock secs @ 369.75/s (n=1161)
   control:  3 wallclock secs @ 1811.99/s (n=5744)
      leon:  4 wallclock secs @ 146.98/s (n=463)
   mobrule:  3 wallclock secs @ 101.57/s (n=324)
      sort:  4 wallclock secs @ 559.62/s (n=1746)

Here's the script:

use Benchmark;

use vars qw(@files);

@files = qw(
    buster.pdf
    mimi.xls
    roscoe.doc
    buster.txt
    mimi.txt
    roscoe.txt
    ) x 100;


printf "There are %d files to sort\n", scalar @files;

sub leon {  
    my @sorted = 
        map { $_->[0] } 
        sort { $a->[1] <=> $b->[1] } 
        map { [ $_, !/\.txt$/ ] 
        } @files;
    }

sub brian {
     my @sorted =
       (
       sort( grep /\.txt\z/,   @files ),
       sort( grep ! /\.txt\z/, @files )
       );
    }

sub mobrule {
    my @sorted = 
        sort { ($b=~/\.txt\z/) <=> ($a=~/\.txt\z/)  ||  $a cmp $b } 
        @files;
    }

sub plain_sort {
    my @sorted = sort @files;
    }

sub control {
    my @sorted = @files;
    }

timethese( -3,
     {
     brian   => \&brian,
     leon    => \&leon,
     mobrule => \&mobrule,
     control => \&control,
     sort    => \&plain_sort,
     }
     );

brian d foy 2010-03-26 17:46:35

out of all of these, I think this is probably the cleanest and most obvious.

Robert P 2010-03-26 18:08:56

It's probably not the right answer based on his follow-up comment about wanting to sort of more file extensions.

brian d foy 2010-03-26 18:15:04

Answer 3

+4 A:

@sorted = sort { $b=~/\.txt$/ <=> $a=~/\.txt$/  ||  $a cmp $b } @files

will put .txt files first and otherwise sort lexicographically (alphabetically).

@sorted = sort { $b=~/\.txt$/ <=> $a=~/\.txt$/ } @files

will put .txt files first and otherwise preserve the original order (sort is stable since Perl 5.8)

mobrule 2010-03-26 17:48:27

Answer 4

+1 A:

Code golf? This will not produce nasty warnings:

@files = map { $_->[0] } sort { @$b <=> @$a } map { [$_, /\.txt$/] } @files

zakovyrya 2010-03-26 17:55:27

Nah, I'm not playing Code Golf. I was writing an FTP client with Net::FTPSSL and came across a situation where I needed files downloaded in a specific order and I wondered if there was a better way to sort than I'm doing already. Thanks for answering.

rarbox 2010-03-26 18:11:31

I'm almost sure that the @$b should be wrong, but I can't make this example not work. It seems to me that @$b should be coerced to a number instead of comparing something in the array, but I guess that's not happening. Why does it work?

brian d foy 2010-03-26 18:41:58

@brian d foy - if /\.txt$/ matches, it gives 1 and resulting reference to array will contain something like ['foo.txt', 1]; if it doesn't, /\.txt$/ produces empty list which in turn gives reference to array like ['foo.bin']. As you can see, arrays with matching results will contain 2 elements, otherwise - 1 element. And, yes, you're right, in sort's block it gets coerced into a number of elements in the array.

zakovyrya 2010-03-27 04:50:07

Ah, tricky. Very nice. However, I wouldn't list that as good style. :)

brian d foy 2010-03-27 05:16:17

@brian d foy - Yep, I wouldn't put this into production either

zakovyrya 2010-03-27 05:56:02

Answer 5

+6 A:

You asked a follow-up comment about doing this for more than one file extension. In that case, I'd build off the Schwartzian Transform. If you're new to the ST, I recommend Joseph Hall's explanation in Effective Perl Programming. Although the Second Edition is coming out very soon, we basically left his explanation as is so the first edition is just as good. Google Books seems to only show one inch of each page for the first edition, so you're out of luck there.

In this answer, I use a weighting function to decide which extensions should move to the top. If an extension doesn't have an explicit weight, I just sort it lexigraphically. You can fool around with the sort to get exactly the order that you want:

@files = qw(
    buster.pdf
    mimi.xls
    roscoe.doc
    buster.txt
    mimi.txt
    roscoe.txt
    buster.rpm
    mimi.rpm
    );

my %weights = qw(
    txt 10
    rpm  9
    );

my @sorted = 
    map { $_->{name} }
    sort { 
        $b->{weight} <=> $a->{weight}
         ||
        $a->{ext}    cmp $b->{ext}
         ||
        $a cmp $b
        }
    map {
        my( $ext ) = /\.([^.]+)\z/;
            { # anonymous hash constructor
            name => $_,
            ext => $ext,
            weight => $weights{$ext} || 0,
            }
        }
    @files;

$" = "\n";
print "@sorted\n";

brian d foy 2010-03-26 18:29:53

Answer 6

+2 A:

To handle multiple extensions efficiently, you could modify brian d foy's sorted greps by partitioning your array in one pass, and then sort each partition independently.

use strict;
use warnings;

use List::MoreUtils qw(part);

my @files = qw(
    bar        Bar.pm       bar.txt
    bar.jpeg   foo          foo.pm
    foo.jpeg   zebra.txt    zebra.pm
    foo.bat    foo.c        foo.pl
    Foo.pm     foo.png      foo.tt
    orange     apple        zebra.stripe
);


my @parts = part { get_extension_priority($_) } @files;

my @sorted = map { sort( @{ $_ || [] } ) } @parts; 

print map "$_\n", @sorted;

BEGIN {

    # Set extension priority order
    my @priority = qw( stripe txt nomatch pl jpeg  );

    # make a hash to look up priority by extension
    my %p = map { $priority[$_], $_ } 0..$#priority;

    sub get_extension_priority {
        my $file = shift;

        return scalar @priority 
            unless /[.](\w*)$/;

        return scalar @priority 
            unless exists $p{$1};

        return $p{$1};
    }
}

daotoad 2010-03-26 20:28:41

Very nice! Every time I see part() though, I wished he had just named it partition() :)

brian d foy 2010-03-26 20:56:19

ansaurus

tags:

views:

answers:

How can I sort an array so that certain file extensions sort to the top?

related questions