tags:

views:

158

answers:

6

I have an array containing a list of files. I want to sort it in a way that it will let me have .txt files in the beginning of the array and the rest of files after that.

This is what I'm doing now, which works fine.

@files = (grep(/\.txt$/,@files),grep(!/\.txt$/,@files));

Is there a better way to do it though?

+2  A: 

Sort takes an optional block as first argument, though in this case a Schwartzian transform would be quicker.

@files = map { $_->[0] } sort { $a->[1] <=> $b->[1] } map { [ $_, !/\.txt$/ ] } @files;
Leon Timmermans
In my tests, I find the Schwartzian Transform to be a bit slower (but only a bit). In my answer, I have to make two passes over the array, but in your example so do you. You also have to make references though.
brian d foy
First time I'm hearing about the Schwartzian Transform. It's definitely interesting. Say I wanted the txt files first, then rpm, then the rest of the files. How would the code above need to be changed? I'm not sure I understand what *exactly* it's doing.
rarbox
@rarbox: see http://stackoverflow.com/questions/594257/when-are-schwartzian-transforms-useful
Ether
+4  A: 

You just need to add a sort in front of each of your greps:

 my @sorted =
   (
   sort( grep /\.txt\z/,   @files ),
   sort( grep ! /\.txt\z/, @files )
   );

The trick here is that you are partitioning the list then sorting each partition independently. Depending on what you are doing, this might be a lot better than trying to do everything in one sort operation. Conversely, it might not always be better.

There are various other ways to get this done, but they aren't this simple. :)

Here's a quick benchmark on my MacBook Air with vanilla Perl 5.10.1:

There are 600 files to sort
     brian:  3 wallclock secs @ 369.75/s (n=1161)
   control:  3 wallclock secs @ 1811.99/s (n=5744)
      leon:  4 wallclock secs @ 146.98/s (n=463)
   mobrule:  3 wallclock secs @ 101.57/s (n=324)
      sort:  4 wallclock secs @ 559.62/s (n=1746)

Here's the script:

use Benchmark;

use vars qw(@files);

@files = qw(
    buster.pdf
    mimi.xls
    roscoe.doc
    buster.txt
    mimi.txt
    roscoe.txt
    ) x 100;


printf "There are %d files to sort\n", scalar @files;

sub leon {  
    my @sorted = 
        map { $_->[0] } 
        sort { $a->[1] <=> $b->[1] } 
        map { [ $_, !/\.txt$/ ] 
        } @files;
    }

sub brian {
     my @sorted =
       (
       sort( grep /\.txt\z/,   @files ),
       sort( grep ! /\.txt\z/, @files )
       );
    }

sub mobrule {
    my @sorted = 
        sort { ($b=~/\.txt\z/) <=> ($a=~/\.txt\z/)  ||  $a cmp $b } 
        @files;
    }

sub plain_sort {
    my @sorted = sort @files;
    }

sub control {
    my @sorted = @files;
    }

timethese( -3,
     {
     brian   => \&brian,
     leon    => \&leon,
     mobrule => \&mobrule,
     control => \&control,
     sort    => \&plain_sort,
     }
     );
brian d foy
out of all of these, I think this is probably the cleanest and most obvious.
Robert P
It's probably not the right answer based on his follow-up comment about wanting to sort of more file extensions.
brian d foy
+4  A: 

 

@sorted = sort { $b=~/\.txt$/ <=> $a=~/\.txt$/  ||  $a cmp $b } @files

will put .txt files first and otherwise sort lexicographically (alphabetically).

@sorted = sort { $b=~/\.txt$/ <=> $a=~/\.txt$/ } @files

will put .txt files first and otherwise preserve the original order (sort is stable since Perl 5.8)

mobrule
+1  A: 

Code golf? This will not produce nasty warnings:

@files = map { $_->[0] } sort { @$b <=> @$a } map { [$_, /\.txt$/] } @files
zakovyrya
Nah, I'm not playing Code Golf. I was writing an FTP client with Net::FTPSSL and came across a situation where I needed files downloaded in a specific order and I wondered if there was a better way to sort than I'm doing already. Thanks for answering.
rarbox
I'm almost sure that the @$b should be wrong, but I can't make this example not work. It seems to me that @$b should be coerced to a number instead of comparing something in the array, but I guess that's not happening. Why does it work?
brian d foy
@brian d foy - if /\.txt$/ matches, it gives 1 and resulting reference to array will contain something like ['foo.txt', 1]; if it doesn't, /\.txt$/ produces empty list which in turn gives reference to array like ['foo.bin']. As you can see, arrays with matching results will contain 2 elements, otherwise - 1 element. And, yes, you're right, in sort's block it gets coerced into a number of elements in the array.
zakovyrya
Ah, tricky. Very nice. However, I wouldn't list that as good style. :)
brian d foy
@brian d foy - Yep, I wouldn't put this into production either
zakovyrya
+6  A: 

You asked a follow-up comment about doing this for more than one file extension. In that case, I'd build off the Schwartzian Transform. If you're new to the ST, I recommend Joseph Hall's explanation in Effective Perl Programming. Although the Second Edition is coming out very soon, we basically left his explanation as is so the first edition is just as good. Google Books seems to only show one inch of each page for the first edition, so you're out of luck there.

In this answer, I use a weighting function to decide which extensions should move to the top. If an extension doesn't have an explicit weight, I just sort it lexigraphically. You can fool around with the sort to get exactly the order that you want:

@files = qw(
    buster.pdf
    mimi.xls
    roscoe.doc
    buster.txt
    mimi.txt
    roscoe.txt
    buster.rpm
    mimi.rpm
    );

my %weights = qw(
    txt 10
    rpm  9
    );

my @sorted = 
    map { $_->{name} }
    sort { 
        $b->{weight} <=> $a->{weight}
         ||
        $a->{ext}    cmp $b->{ext}
         ||
        $a cmp $b
        }
    map {
        my( $ext ) = /\.([^.]+)\z/;
            { # anonymous hash constructor
            name => $_,
            ext => $ext,
            weight => $weights{$ext} || 0,
            }
        }
    @files;

$" = "\n";
print "@sorted\n";
brian d foy
+2  A: 

To handle multiple extensions efficiently, you could modify brian d foy's sorted greps by partitioning your array in one pass, and then sort each partition independently.

use strict;
use warnings;

use List::MoreUtils qw(part);

my @files = qw(
    bar        Bar.pm       bar.txt
    bar.jpeg   foo          foo.pm
    foo.jpeg   zebra.txt    zebra.pm
    foo.bat    foo.c        foo.pl
    Foo.pm     foo.png      foo.tt
    orange     apple        zebra.stripe
);


my @parts = part { get_extension_priority($_) } @files;

my @sorted = map { sort( @{ $_ || [] } ) } @parts; 

print map "$_\n", @sorted;

BEGIN {

    # Set extension priority order
    my @priority = qw( stripe txt nomatch pl jpeg  );

    # make a hash to look up priority by extension
    my %p = map { $priority[$_], $_ } 0..$#priority;

    sub get_extension_priority {
        my $file = shift;

        return scalar @priority 
            unless /[.](\w*)$/;

        return scalar @priority 
            unless exists $p{$1};

        return $p{$1};
    }
}
daotoad
Very nice! Every time I see part() though, I wished he had just named it partition() :)
brian d foy