views:

343

answers:

2

I have a folder named Lib and I am using the File::Find module to search that folder in whole dir say, D:\. It's taking a long time to search, say even 5 mins if the drive has a lot of subdirectories. How can I search that Lib faster so it will be done in seconds?

My code looks like this:

    find( \&Lib_files, $dir);
    sub Lib_files
    {
       return unless -d;
      if ($_=~m/^([L|l]ib(.*))/)
      {
          print"$_";
      }
      return;
    }
+19  A: 

Searching the file system without a preexisting index is IO bound. Otherwise, products ranging from locate to Windows Desktop Search would not exist.

Type D:\> dir /b/s > directory.lst and observe how long it takes for that command to run. You should not expect to beat that without indexing files first.

One major improvement you can make is to print less often. A minor improvement is not to use capturing parentheses if you are not going to capture:

my @dirs;

sub Lib_files {
   return unless -d $File::Find::name; 
   if ( /^[Ll]ib/ ) {
        push @dirs, $File::Find::name;
   }
   return;
}

On my system, a simple script using File::Find to print the names of all subdirectories under my home directory with about 150,000 files takes a few minutes to run compared to dir %HOME% /ad/b/s > dir.lst which completes in about 20 seconds.

I would be inclined to use:

use File::Basename;

my @dirs = grep { fileparse($_) =~ /^[Ll]ib/ }
           split /\n/,  `dir %HOME% /ad/b/s`;

which completed in under 15 seconds on my system.

If there is a chance there is some other dir.exe in %PATH%, cmd.exe's built-in dir will not be invoked. You can use qx! cmd.exe /c dir %HOME% /ad/b/s ! to make sure that the right dir is invoked.

Sinan Ünür
+1 for not using capturing parentheses - but on the whole, it is likely to be a second-order effect compared to disk access time.
Jonathan Leffler
@Jonathan Leffler: Yeah, I should correct how I phrased that because the most important savings will come from not printing so often. However, it will be hard to beat `qx'dir d:\ /ad/b/s'` for this kind of thing.
Sinan Ünür
Don't you mean `/[Ll]ib/`? `[L|l]` is (roughly) equivalent to `(?:L|\||l)`, not `(?:L|l)`.
Robert P
Sinan Ünür
Hi can u give me the exact code because the above code displays "file not found" when i executed
lokesh
@lokesh That is the exact code I ran. **You** need to **replace** `%HOME%` with whatever directory you are trying to search.
Sinan Ünür
...and accept the answer.
Leonardo Herrera
hi...am getting the result " File not found " when i run the scriptuse File::Basename; my @dirs = grep { fileparse($_) =~ /^[L|l]ib/ } split /\n/, `dir e:\\/ad/b/s`; print @dirs;what may be the prob?
lokesh
A: 

how about not using File::Find module

use Cwd;
sub find{
    my ($wdir) = shift;
    my ($sdir) = &cwd; 
    chdir($wdir) or die "Unable to enter dir $wdir:$!\n";
    opendir(DIR, ".") or die "Unable to open $wdir:$!\n";
    foreach my $name (readdir(DIR) ){
        next if ($name eq ".");
        next if ($name eq "..");
        if (-d $name){
            &find($name);
            next;
        }

        print $name ."\n";
        chdir($sdir) or die "Unable to change to dir $sdir:$!\n";
    }
    closedir(DIR);
}
&find(".");
One way to optimise this would be to store each directory name in a local variable so that you can close the directory handle before recursing. The example given, unfortunately, holds the each directory handle open as it inspects sub-directories and if you have a very deep path you could be starving your system of handles.
PP
Sinan Ünür