ansaurus

Question

How can I use Perl's readdir multiple times efficiently on the same directory?

Answer 1

+6 A:

Why don't you read all the files once and then perform the filtering on that list?

jamessan 2010-01-02 23:54:21

Assuming the directory isn't too big (depends on filesystem) this would seem to be preferable. Would cut down on a lot of IO.

mopoke 2010-01-03 00:14:02

Answer 2

+6 A:

Directory (and file) handles are iterators. Reading from one consumes data, you need to either store that data or reset the position of the iterator. Closing and reopening is the hard way; use rewinddir instead.

Alternately, use glob to do the reading and filtering in one step.

Michael Carman 2010-01-02 23:57:30

Answer 3

+1 A:

Would rewinddir() be of assistance at this juncture?

Penfold 2010-01-03 00:48:02

Answer 4

+1 A:

Why dontcha just let @files = <abc_*>?

Zano 2010-01-03 01:21:23

Answer 5

A:

I would code this in a single pass as follows:

while readdir() returns a file name
    if the file prefix has not been seen before
        record prefix and create directory for this prefix
    end if
    move (copy?) file to correct directory
end while

For the anally retentive here is some (untested) code that should work. Error handling is left as an exercise for the reader.

require File::Copy;

my $old_base_dir = "original_directory_path";
opendir (my $dir_handle, "$old_base_dir");

my %dir_list;
my $new_base_dir = "new_directory_path";

while (my $file_name = readdir($dir_handle)) {
    next if ! -f $file_name;   # only move regular files
    (my $prefix) = split /_/, $file_name, 1; # assume first _ marks end of prefix

    mkdir "$new_base_dir/$prefix" unless exists $dir_list{$prefix};

    move("$old_base_dir/$file_name", "$new_base_dir/$file_name"); # assume unix system
}

closedir($dir_handle};

David Harris 2010-01-03 02:22:05

A) That's not Perl B) on all Perls prior to 5.11.2 you have to do `while(defined( local $_ = readdir )){ ... }`

Brad Gilbert 2010-01-03 03:04:38

Agreed!!! I was answering the OP's question using pseudo-code to show him the general idea how to make his approach more efficient and obtain the resulte he needed in a single pass.

David Harris 2010-01-03 23:27:20

Answer 6

A:

A lot of your suggestions worked! I appreciate that!

Jamessan, I totally agree with you. Operation in memory is way faster than frequent file IO. Thanks!

Zano's way is very neat, I like it a lot. But in my own case, it seems the @files will contain all the necessary file path information if I specified path information in the "<>". I know it's not difficult to get rid of them and it is very fast! Thanks!

Thank you all for your responsive feedbacks!

Jin 2010-01-03 03:21:04

Answer 7

A:

Use the Text::Trie module to group files in one pass through readdir:

use File::Spec::Functions qw/ catfile /;
use Text::Trie qw/ Trie walkTrie /;

sub group_files {
  my($dir,$pattern) = @_;

  opendir my $dh, $dir or die "$0: opendir $dir: $!";

  my @trie = Trie readdir $dh;

  my @groups;
  my @prefix;
  my $group = [];

  my $exitnode = sub {
    pop @prefix;
    unless (@prefix) {
      push @groups => $group if @$group;
      $group = [];
    }
  };

  my $leaf = sub {
    local $_ = join "" => @prefix;
    if (/$pattern/) {
      my $full = catfile $dir => "$_$_[0]";
      push @$group => $full if -f $full;
    }
    $exitnode->() unless @prefix;
  };

  my $node = sub { push @prefix => $_[0] };

  @$_[0,1,5] = ($leaf, $node, $exitnode) for \my @callbacks;
  walkTrie @callbacks => @trie;

  wantarray ? @groups : \@groups;
}

You might use it as in

my($pattern,$dir) = @ARGV;

$pattern //= "^";
$dir     //= ".";

my $qr = eval "qr/$pattern/" || die "$0: bad pattern ($pattern)\n";
my @groups = group_files $dir, $qr;

use Data::Dumper;
print Dumper \@groups;

For example:

$ ls
abc_1  abc_12  abc_2  abc_3  abc_4  prefixes  xy_7  xyz_1  xyz_2  xyz_3

$ ./prefixes
$VAR1 = [
          [
            './prefixes'
          ],
          [
            './abc_4',
            './abc_1',
            './abc_12',
            './abc_3',
            './abc_2'
          ],
          [
            './xy_7',
            './xyz_1',
            './xyz_3',
            './xyz_2'
          ]
        ];

Use the optional regular-expression argument as a predicate on prefixes:

$ ./prefixes '^.{3,}'
$VAR1 = [
          [
            './abc_4',
            './abc_1',
            './abc_12',
            './abc_3',
            './abc_2'
          ],
          [
            './xyz_1',
            './xyz_3',
            './xyz_2'
          ]
        ];

$ ./prefixes '^.{2,}'
$VAR1 = [
          [
            './abc_4',
            './abc_1',
            './abc_12',
            './abc_3',
            './abc_2'
          ],
          [
            './xy_7',
            './xyz_1',
            './xyz_3',
            './xyz_2'
          ]
        ];

Greg Bacon 2010-01-03 04:07:08

ansaurus

tags:

views:

answers:

How can I use Perl's readdir multiple times efficiently on the same directory?

related questions