tags:

views:

325

answers:

7

Hi all,

I had a question using Perl's readdir(). I want to gather all the files in a directory that have the same prefix file name I specified. So, for each prefix, I need to use Perl's readdir() to grep all related files.

Suppose the prefix is "abc", there are several files with the names "abc_1", "abc_2", etc.

However, I noticed that if I put opendir, closedir outside of a loop (loop through a list of file name prefixes), I can only grep the very first prefix from the dir -- all the following grepping failed. If I chose to call opendir and closedir each time in the loop, it worked fine but I'm afraid it is not efficient at all.

My question is how can I make it more efficient? It is weird that I can't call readdir multiple times in a loop.

Thanks a lot in advance!

-Jin

+6  A: 

Why don't you read all the files once and then perform the filtering on that list?

jamessan
Assuming the directory isn't too big (depends on filesystem) this would seem to be preferable. Would cut down on a lot of IO.
mopoke
+6  A: 

Directory (and file) handles are iterators. Reading from one consumes data, you need to either store that data or reset the position of the iterator. Closing and reopening is the hard way; use rewinddir instead.

Alternately, use glob to do the reading and filtering in one step.

Michael Carman
+1  A: 

Would rewinddir() be of assistance at this juncture?

Penfold
+1  A: 

Why dontcha just let @files = <abc_*>?

Zano
A: 

I would code this in a single pass as follows:

while readdir() returns a file name
    if the file prefix has not been seen before
        record prefix and create directory for this prefix
    end if
    move (copy?) file to correct directory
end while

For the anally retentive here is some (untested) code that should work. Error handling is left as an exercise for the reader.

require File::Copy;

my $old_base_dir = "original_directory_path";
opendir (my $dir_handle, "$old_base_dir");

my %dir_list;
my $new_base_dir = "new_directory_path";

while (my $file_name = readdir($dir_handle)) {
    next if ! -f $file_name;   # only move regular files
    (my $prefix) = split /_/, $file_name, 1; # assume first _ marks end of prefix

    mkdir "$new_base_dir/$prefix" unless exists $dir_list{$prefix};

    move("$old_base_dir/$file_name", "$new_base_dir/$file_name"); # assume unix system
}

closedir($dir_handle};
David Harris
A) That's not Perl B) on all Perls prior to 5.11.2 you have to do `while(defined( local $_ = readdir )){ ... }`
Brad Gilbert
Agreed!!! I was answering the OP's question using pseudo-code to show him the general idea how to make his approach more efficient and obtain the resulte he needed in a single pass.
David Harris
A: 

A lot of your suggestions worked! I appreciate that!

Jamessan, I totally agree with you. Operation in memory is way faster than frequent file IO. Thanks!

Zano's way is very neat, I like it a lot. But in my own case, it seems the @files will contain all the necessary file path information if I specified path information in the "<>". I know it's not difficult to get rid of them and it is very fast! Thanks!

Thank you all for your responsive feedbacks!

Jin
A: 

Use the Text::Trie module to group files in one pass through readdir:

use File::Spec::Functions qw/ catfile /;
use Text::Trie qw/ Trie walkTrie /;

sub group_files {
  my($dir,$pattern) = @_;

  opendir my $dh, $dir or die "$0: opendir $dir: $!";

  my @trie = Trie readdir $dh;

  my @groups;
  my @prefix;
  my $group = [];

  my $exitnode = sub {
    pop @prefix;
    unless (@prefix) {
      push @groups => $group if @$group;
      $group = [];
    }
  };

  my $leaf = sub {
    local $_ = join "" => @prefix;
    if (/$pattern/) {
      my $full = catfile $dir => "$_$_[0]";
      push @$group => $full if -f $full;
    }
    $exitnode->() unless @prefix;
  };

  my $node = sub { push @prefix => $_[0] };

  @$_[0,1,5] = ($leaf, $node, $exitnode) for \my @callbacks;
  walkTrie @callbacks => @trie;

  wantarray ? @groups : \@groups;
}

You might use it as in

my($pattern,$dir) = @ARGV;

$pattern //= "^";
$dir     //= ".";

my $qr = eval "qr/$pattern/" || die "$0: bad pattern ($pattern)\n";
my @groups = group_files $dir, $qr;

use Data::Dumper;
print Dumper \@groups;

For example:

$ ls
abc_1  abc_12  abc_2  abc_3  abc_4  prefixes  xy_7  xyz_1  xyz_2  xyz_3

$ ./prefixes
$VAR1 = [
          [
            './prefixes'
          ],
          [
            './abc_4',
            './abc_1',
            './abc_12',
            './abc_3',
            './abc_2'
          ],
          [
            './xy_7',
            './xyz_1',
            './xyz_3',
            './xyz_2'
          ]
        ];

Use the optional regular-expression argument as a predicate on prefixes:

$ ./prefixes '^.{3,}'
$VAR1 = [
          [
            './abc_4',
            './abc_1',
            './abc_12',
            './abc_3',
            './abc_2'
          ],
          [
            './xyz_1',
            './xyz_3',
            './xyz_2'
          ]
        ];

$ ./prefixes '^.{2,}'
$VAR1 = [
          [
            './abc_4',
            './abc_1',
            './abc_12',
            './abc_3',
            './abc_2'
          ],
          [
            './xy_7',
            './xyz_1',
            './xyz_3',
            './xyz_2'
          ]
        ];
Greg Bacon