views:

1424

answers:

10

You sometimes hear it said about Perl that there might be 6 different ways to approach the same problem. Good Perl developers usually have well-reasoned insights for making choices between the various possible methods of implementation.

So an example Perl problem:

A simple script which recursively iterates through a directory structure, looking for files which were modified recently (after a certain date, which would be variable). Save the results to a file.

The question, for Perl developers: What is your best way to accomplish this?

A: 

I write a subroutine that reads a directory with readdir, throws out the "." and ".." directories, recurses if it finds a new directory, and examines the files for what I'm looking for (in your case, you'll want to use utime or stat). By time the recursion is done, every file should have been examined.

I think all the functions you'd need for this script are described briefly here: http://www.cs.cf.ac.uk/Dave/PERL/node70.html

The semantics of input and output are a fairly trivial exercise which I'll leave to you.

antik
Just pray you don't have a symbolic link that points to an ancestor directory, otherwise this simple-minded approach will loop forever.
dland
I usually script in Windows where this isn't a problem. In Linux, checks could be written to check for symlinks to avoid the problem, if necessary. It would create a problem though so thanks for pointing it out!Thank you for mentioning the simplicity of my answer: that was what was asked for...
antik
A: 

I'm riskying to get downvoted, but IMHO 'ls' (with appropriate params) command does it in a best known performant way. In this case it might be quite good solution to pipe 'ls' from perl code through shell, returning results to an array or hash.

Edit: It could also be 'find' used, as proposed in comments.

Thevs
not much good if the script isn't being used on a *nix based OS though.
workmad3
ls can't do complex selections. Also it doesn't handle newlines embedded in names of files.
Leon Timmermans
If you must abandon portability and call a shell command, then 'find' is the one that matches the questioner's needs. However, File::Find achieves the same thing in native Perl, and is preferable.
slim
@workmad3: Do you really think someone nowadays uses Perl on other systems? So I assume it's *nix.@Leon: Newlines in filenames are extremely geekish things rather than real situation.Wasn't question about the simpliest way? Writing it in Perl is like writing games using awk, sed or bash....
Thevs
Thevs: Are you joking? I use Perl on Unix and Windows. I've heard that people also use it on Macs, VMS...
runrig
...though I do have ls on my Windows system via msys.
runrig
OS X has 'ls' in its BSD-like unix shell
Thevs
+8  A: 

File::Find is the right way to solve this problem. There is no use in reimplementing stuff that already exists in other modules, but reimplementing something that is in a standard module should really be discouraged.

Leon Timmermans
File::Find does not do things as optimally as it could. It ignores any return from "wanted", so you can't stop it from traversing parts of the tree you don't need, if that were even possible, because it gathers all the paths first and returns them to you all at once.
Axeman
File::Find does not return any paths, it makes a call to your "wanted" function. You could "die()" in the wanted function and trap it with an "eval { }" around the find() if you wanted an early exit. File::Find::Rule does return all the paths.
runrig
Above comment was in reply to Axeman. Also, it is File::Find::Rule that gathers all the paths and returns them all at once, and can not exit early.
runrig
+4  A: 

My preferred method is to use the File::Find module as so:

use File::Find;
find (\&checkFile, $directory_to_check_recursively);

sub checkFile()
{
   #examine each file in here. Filename is in $_ and you are chdired into it's directory
   #directory is also available in $File::Find::dir
}
workmad3
+15  A: 

Where the problem is solved mainly by standard libraries use them.

File::Find in this case works nicely.

There may be many ways to do things in perl, but where a very standard library exists to do something, it should be utilised unless it has problems of it's own.

#!/usr/bin/perl

use strict;
use File::Find();

File::Find::find( {wanted => \&wanted}, ".");

sub wanted {
  my (@stat);
  my ($time) = time();
  my ($days) = 5 * 60 * 60 * 24;

  @stat = stat($_);
  if (($time - $stat[9]) >= $days) {
    print "$_ \n";
  }
}
Phil
No need to get the current time and convert days to seconds, ($days <= -M) would do
runrig
Or ($days >= -M), now that I read the OP.
runrig
+15  A: 

This sounds like a job for File::Find::Rule:

#!/usr/bin/perl
use strict;
use warnings;
use autodie;  # Causes built-ins like open to succeed or die.
              # You can 'use Fatal qw(open)' if autodie is not installed.

use File::Find::Rule;
use Getopt::Std;

use constant SECONDS_IN_DAY => 24 * 60 * 60;

our %option = (
    m => 1,        # -m switch: days ago modified, defaults to 1
    o => undef,    # -o switch: output file, defaults to STDOUT
);

getopts('m:o:', \%option);

# If we haven't been given directories to search, default to the
# current working directory.

if (not @ARGV) {
    @ARGV = ( '.' );
}

print STDERR "Finding files changed in the last $option{m} day(s)\n";


# Convert our time in days into a timestamp in seconds from the epoch.
my $last_modified_timestamp = time() - SECONDS_IN_DAY * $option{m};

# Now find all the regular files, which have been modified in the last
# $option{m} days, looking in all the locations specified in
# @ARGV (our remaining command line arguments).

my @files = File::Find::Rule->file()
                            ->mtime(">= $last_modified_timestamp")
                            ->in(@ARGV);

# $out_fh will store the filehandle where we send the file list.
# It defaults to STDOUT.

my $out_fh = \*STDOUT;

if ($option{o}) {
    open($out_fh, '>', $option{o});
}

# Print our results.

print {$out_fh} join("\n", @files), "\n";
pjf
Nice - although it has the disadvantage of not being a standard module.
slim
There's no such thing as a "standard" module. If you mean module bundled with perl itself, those fall under two categories: historical, or of use in installing other modules; neither of which is a good reason to prefer those to something else from CPAN.
ysth
+9  A: 

There aren't six ways to do this, there's the old way, and the new way. The old way is with File::Find, and you already have a couple of examples of that. File::Find has a pretty awful callback interface, it was cool 20 years ago, but we've moved on since then.

Here's a real life (lightly amended) program I use to clear out the cruft on one of my production servers. It uses File::Find::Rule, rather than File::Find. File::Find::Rule has a nice declarative interface that reads easily.

Randal Schwartz also wrote File::Finder, as a wrapper over File::Find. It's quite nice but it hasn't really taken off.

#! /usr/bin/perl -w

# delete temp files on agr1

use strict;
use File::Find::Rule;
use File::Path 'rmtree';

for my $file (

    File::Find::Rule->new
        ->mtime( '<' . days_ago(2) )
        ->name( qr/^CGItemp\d+$/ )
        ->file()
        ->in('/tmp'),

    File::Find::Rule->new
        ->mtime( '<' . days_ago(20) )
        ->name( qr/^listener-\d{4}-\d{2}-\d{2}-\d{4}.log$/ )
        ->file()
        ->maxdepth(1)
        ->in('/usr/oracle/ora81/network/log'),

    File::Find::Rule->new
        ->mtime( '<' . days_ago(10) )
        ->name( qr/^batch[_-]\d{8}-\d{4}\.run\.txt$/ )
        ->file()
        ->maxdepth(1)
        ->in('/var/log/req'),

    File::Find::Rule->new
        ->mtime( '<' . days_ago(20) )
        ->or(
            File::Find::Rule->name( qr/^remove-\d{8}-\d{6}\.txt$/ ),
            File::Find::Rule->name( qr/^insert-tp-\d{8}-\d{4}\.log$/ ),
        )
        ->file()
        ->maxdepth(1)
        ->in('/home/agdata/import/logs'),

    File::Find::Rule->new
        ->mtime( '<' . days_ago(90) )
        ->or(
            File::Find::Rule->name( qr/^\d{8}-\d{6}\.txt$/ ),
            File::Find::Rule->name( qr/^\d{8}-\d{4}\.report\.txt$/ ),
        )
        ->file()
        ->maxdepth(1)
        ->in('/home/agdata/redo/log'),

) {
    if (unlink $file) {
        print "ok $file\n";
    }
    else {
        print "fail $file: $!\n";
    }
}

{
    my $now;
    sub days_ago {
        # days as number of seconds
        $now ||= time;
        return $now - (86400 * shift);
    }
}
dland
+8  A: 

Others have mentioned File::Find, which is the way I'd go, but you asked for an iterator, which File::Find isn't (nor is File::Find::Rule). You might want to look at File::Next or File::Find::Object, which do have an iterative interfaces. Mark Jason Dominus goes over building your own in chapter 4.2.2 of Higher Order Perl.

runrig
And to be fair to MJD, File::Next is directly ripped off from his book.
Andy Lester
+3  A: 

I wrote File::Find::Closures as a set of closures that you can use with File::Find so you don't have to write your own. There's a couple of mtime functions that should handle

use File::Find;
use File::Find::Closures qw(:all);

my( $wanted, $list_reporter ) = find_by_modified_after( time - 86400 );
#my( $wanted, $list_reporter ) = find_by_modified_before( time - 86400 );

File::Find::find( $wanted, @directories );

my @modified = $list_reporter->();

You don't really need to use the module because I mostly designed it as a way that you could look at the code and steal the parts that you wanted. In this case it's a little trickier because all the subroutines that deal with stat depend on a second subroutine. You'll quickly get the idea from the code though.

Good luck,

brian d foy
+4  A: 

There's my File::Finder, as already mentioned, but there's also my iterator-as-a-tied-hash solution from Finding Files Incrementally (Linux Magazine).

Randal Schwartz