views:

934

answers:

6

I have a Perl script that traverses a directory hierarchy using File::Next::files. It will only return to the script files that end in ".avi", ".flv", ".mp3", ".mp4", and ".wmv." Also it will skip the following sub directories: ".svn" and any sub directory that ends in ".frames." This is specified in the file_filter and descend_filter subroutines below.

my $iter = File::Next::files(
        { file_filter => \&file_filter, descend_filter => \&descend_filter },
        $directory );

sub file_filter { 
    # Called from File::Next:files.
    # Only select video files that end with the following extensions.
    /.(avi|flv|mp3|mp4|wmv)$/
}

sub descend_filter { 
    # Called from File::Next:files.
    # Skip subfolders that either end in ".frames" or are named the following:
    $File::Next::dir !~ /.frames$|^.svn$/
}

What I want to do is place the allowed file extensions and disallowed sub directory names in a configuration file so they can be updated on the fly.

What I want to know is how do I code the subroutines to build regex constructs based on the parameters in the configuration file?

/.(avi|flv|mp3|mp4|wmv)$/

$File::Next::dir !~ /.frames$|^.svn$/
+16  A: 

Assuming that you've parsed the configuration file to get a list of extensions and ignored directories, you can build the regular expression as a string and then use the qr operator to compile it into a regular expression:

my @extensions = qw(avi flv mp3 mp4 wmv);  # parsed from file
my $pattern    = '\.(' . join('|', @wanted) . ')$';
my $regex      = qr/$pattern/;

if ($file =~ $regex) {
    # do something
}

The compilation isn't strictly necessary; you can use the string pattern directly:

if ($file =~ /$pattern/) {
    # do something
}

Directories are a little harder because you have two different situations: full names and suffixes. Your configuration file will have to use different keys to make it clear which is which. e.g. "dir_name" and "dir_suffix." For full names I'd just build a hash:

%ignore = ('.svn' => 1);

Suffixed directories can be done the same way as file extensions:

my $dir_pattern = '(?:' . join('|', map {quotemeta} @dir_suffix), ')$';
my $dir_regex   = qr/$dir_pattern/;

You could even build the patterns into anonymous subroutines to avoid referencing global variables:

my $file_filter    = sub { $_ =~ $regex };
my $descend_filter = sub {
    ! $ignore{$File::Next::dir} &&
    ! $File::Next::dir =~ $dir_regex;
};

my $iter = File::Next::files({
    file_filter    => $file_filter,
    descend_filter => $descend_filter,
}, $directory);
Michael Carman
What I didn't explained was that I will have clients modifying the configuration file. I can't assume they will know Perl or know enough to not introduce a syntax error into the regular expression. So I really don't want to read a regular expression from the configuration file, I just want to a list of file extensions and directory names and/or directory patterns. Example:ext = aviext = flvext = mp3dir = .svndirp= .framesOnce this information is read, then I want to dynamically create something that will function like:.(avi|flv|mp3|mp4|wmv)$
Dr. Faust
Ah, that wasn't clear to me before. I've revised my answer.
Michael Carman
+2  A: 

Lets say that you use Config::General for you config-file and that it contains these lines:

<MyApp>
    extensions    avi flv mp3 mp4 wmv
    unwanted      frames svn
</MyApp>

You could then use it like so (see the Config::General for more):

my $conf = Config::General->new('/path/to/myapp.conf')->getall();
my $extension_string = $conf{'MyApp'}{'extensions'};

my @extensions = split m{ }, $extension_string;

# Some sanity checks maybe...

my $regex_builder = join '|', @extensions;

$regex_builder = '.(' . $regex_builder . ')$';

my $regex = qr/$regex_builder/;

if($file =~ m{$regex}) {
    # Do something.
}


my $uw_regex_builder = '.(' . join ('|', split (m{ }, $conf{'MyApp'}{'unwanted'})) . ')$';
my $unwanted_regex = qr/$uw_regex_builder/;

if(File::Next::dir !~ m{$unwanted_regex}) {
    # Do something. (Note that this does not enforce /^.svn$/. You
    # will need some kind of agreed syntax in your conf-file for that.
}

(This is completely untested.)

Anon
Thanks. By the way, why is the my $regex = qr/$regex_builder/ statement necessary?
Dr. Faust
It isn't necessary to build the whole regex into a string before using `qr//`. You can just do this:my $regex_builder = join '|', @extensions;my $regex = qr/\.($regex_builder)$/;
rjray
+2  A: 

Build it like you would a normal string and then use interpolation at the end to turn it into a compiled regex. Also be careful, you are not escaping . or putting it in a character class, so it means any character (rather than a literal period).

#!/usr/bin/perl

use strict;
use warnings;

my (@ext, $dir, $dirp);
while (<DATA>) {
    next unless my ($key, $val) = /^ \s* (ext|dirp|dir) \s* = \s* (\S+)$/x;
    push @ext, $val if $key eq 'ext';
    $dir = $val     if $key eq 'dir';
    $dirp = $val    if $key eq 'dirp';
}

my $re = join "|", @ext;
$re = qr/[.]($re)$/;

print "$re\n";

while (<>) {
    print /$re/ ? "matched" : "didn't match", "\n";
}

__DATA__
ext = avi
ext = flv
ext = mp3
dir = .svn
dirp= .frames
Chas. Owens
When I ran the code and printed out $re I got: (?-xism:[.](avi|flv|mp3)$)Seems to work. Thanks very much.
Dr. Faust
I'd assume that there could be multiple values for directories and/or directory suffixes to ignore, although that wasn't explicitly specified.
Michael Carman
+1  A: 

Its reasonably straight forward with File::Find::Rule, just a case of creating the list before hand.

use strict;
use warnings;
use aliased 'File::Find::Rule';


# name can do both styles. 
my @ignoredDirs = (qr/^.svn/,  '*.frames' );
my @wantExt = qw( *.avi *.flv *.mp3 );

my $finder = Rule->or( 
    Rule->new->directory->name(@ignoredDirs)->prune->discard, 
    Rule->new->file->name(@wantExt)
);

$finder->start('./');

while( my $file = $finder->match() ){
    # Matching file.
}

Then its just a case of populating those arrays. ( Note: above code also untested, but will likely work ). I'd generally use YAML for this, it makes life easier.

use strict;
use warnings;
use aliased 'File::Find::Rule';
use YAML::XS;

my $config = YAML::XS::Load(<<'EOF');
---
ignoredir:
- !!perl/regexp (?-xism:^.svn)
- '*.frames'
want:
- '*.avi'
- '*.flv'
- '*.mp3'
EOF

my $finder = Rule->or( 
    Rule->new->directory->name(@{ $config->{ignoredir} })->prune->discard, 
    Rule->new->file->name(@{ $config->{want} })
);

$finder->start('./');

while( my $file = $finder->match() ){
    # Matching file.
}

Note Using the handy module 'aliased.pm' which imports "File::Find::Rule" for me as "Rule".

  • File::Find::Rule - Alternative interface to File::Find
  • YAML::XS - Perl YAML Serialization using XS and libyaml
  • aliased - Use shorter versions of class names.
Kent Fredric
+1  A: 

If you want to build a potentially large regexp and don't want to bother debugging the parentheses, use a Perl module to build it for you!

use strict;
use Regexp::Assemble;

my $re = Regexp::Assemble->new->add(qw(avi flv mp3 mp4 wmv));

...

if ($file =~ /$re/) {
    # a match!
}

print "$re\n"; # (?:(?:fl|wm)v|mp[34]|avi)
dland
Very cool - thanks.
Dr. Faust
A: 

Although File::Find::Rule already has ways to deal with this, in similar cases you don't really want a regex. The regex doesn't buy you much here because you're looking for a fixed sequence of characters at the end of each filename. You want to know if that fixed sequence is in a list of sequences that interest you. Store all the extensions in a hash and look in that hash:

my( $extension ) = $filename =~ m/\.([^.]+)$/;
if( exists $hash{$extension} ) { ... }

You don't need to build up a regular expression, and you don't need to go through several possible regex alternations to check every extension you have to examine.

brian d foy
Thank you - much appreciated.
Dr. Faust