tags:

views:

158

answers:

5

I would like to check whether some string match any of a given set of regexes. How can I do that?

+3  A: 

I'm not exactly sure what you are looking for, but something like that?

#!/usr/bin/perl
@regexes = ( qr/foo/ , qr/bar/ );
while ($line=<>){
  chomp $line;
  $match=0;
  for $re (@regexes){
    $match++ if ($line =~ $re);
  }
  print "$line matches $match regexes\n";
}

You could also compile all of them into a single reges like this:

#!/usr/bin/perl
@regexes = ( qr/foo/ , qr/bar/ );
$allre= "(".join("|",@regexes).")";
$compiled=qr/$allre/;
while(<>){
  chomp;
  print "$_ matches ($1)\n" if /$compiled/;
}

Hope that helps.

Sec
+13  A: 

Use smart matching if you have perl version 5.10 or newer!

#! /usr/bin/perl

use warnings;
use strict;

use feature 'switch';

my @patterns = (
  qr/foo/,
  qr/bar/,
  qr/baz/,
);

for (qw/ blurfl bar quux foo baz /) {
  print "$_: ";
  given ($_) {
    when (@patterns) {
      print "hit!\n";
    }
    default {
      print "miss.\n";
    }
  }
}

Although you don't see an explicit ~~ operator, Perl's given/when does it behind the scenes:

Most of the power comes from implicit smart matching:

when($foo)

is exactly equivalent to

when($_ ~~ $foo)

Most of the time, when(EXPR) is treated as an implicit smart match of $_ , i.e. $_ ~~ EXPR. (See Smart matching in detail for more information on smart matching.)

Smart matching in detail gives a table of many combinations you can use, and the above code corresponds to the case where $a is Any and $b is Array, which corresponds roughly to

grep $a ~~ $_, @$b

except the search short-circuits, i.e., returns quickly on a match rather than processing all elements. In the implicit loop then, we're smart matching Any against Regex, which is

$a =~ /$b/

Output:

blurfl: miss.
bar: hit!
quux: miss.
foo: hit!
baz: hit!
Greg Bacon
That smart match is very cool. I never thought of doing something like that. I'll add a similar example to the FAQ later today. :)
brian d foy
+1 Thanks! just one small thing: knowing there is a match is enough for me, so I can stop immediately after the first match was found and don't try to look for matches against remaining patterns. How can I change your code to do that?
David B
@David B The implicit loop in the smart match already stops (“short-circuits”) as soon as it finds a match. Do you mean the outer loop over `"blurfl"`, `"bar"`, and friends? If so, use `last` inside the `when` block.
Greg Bacon
@gbacon sorry, my mistake. thanks! (+1)
David B
+1: that is really great solution!
drewk
+5  A: 

From perlfaq6's answer to How do I efficiently match many regular expressions at once?, in this case the latest development version that I just updated with a smart match example.


How do I efficiently match many regular expressions at once?

(contributed by brian d foy)

If you have Perl 5.10 or later, this is almost trivial. You just smart match against an array of regular expression objects:

my @patterns = ( qr/Fr.d/, qr/B.rn.y/, qr/W.lm./ );

if( $string ~~ @patterns ) {
    ...
    };

The smart match stops when it finds a match, so it doesn't have to try every expression.

Earlier than Perl 5.10, you have a bit of work to do. You want to avoid compiling a regular expression every time you want to match it. In this example, perl must recompile the regular expression for every iteration of the C loop since it has no way to know what C will be:

my @patterns = qw( foo bar baz );

LINE: while( <DATA> ) {
    foreach $pattern ( @patterns ) {
        if( /\b$pattern\b/i ) {
            print;
            next LINE;
            }
        }
    }

The C operator showed up in perl 5.005. It compiles a regular expression, but doesn't apply it. When you use the pre-compiled version of the regex, perl does less work. In this example, I inserted a C to turn each pattern into its pre-compiled form. The rest of the script is the same, but faster:

my @patterns = map { qr/\b$_\b/i } qw( foo bar baz );

LINE: while( <> ) {
    foreach $pattern ( @patterns ) {
        if( /$pattern/ )
            {
            print;
            next LINE;
            }
        }
    }

In some cases, you may be able to make several patterns into a single regular expression. Beware of situations that require backtracking though.

my $regex = join '|', qw( foo bar baz );

LINE: while( <> ) {
    print if /\b(?:$regex)\b/i;
    }

For more details on regular expression efficiency, see I by Jeffrey Freidl. He explains how regular expressions engine work and why some patterns are surprisingly inefficient. Once you understand how perl applies regular expressions, you can tune them for individual situations.

brian d foy
+3  A: 

If using a large number of regexps, you might be interested in Regexp::Optimizer

See from the synopsis section :

use Regexp::Optimizer;
my $o  = Regexp::Optimizer->new;
my $re = $o->optimize(qr/foobar|fooxar|foozap/);
# $re is now qr/foo(?:[bx]ar|zap)/

That might be more efficient, if you're willing to install an extra module.

Stephane
+7  A: 

My go-to for testing a value against multiple regexes at once is Regexp::Assemble, which will "Assemble multiple Regular Expressions into a single RE" in a manner somewhat more intelligent and optimized than simply doing a join '|', @regexps. You are also able, by default, to retrieve the portion of the text which matched and, if you need to know which pattern matched, the track switch will provide that information. Its performance is quite good - in one application, I'm using it to test against 1700 patterns at once - and I have yet to need anything that it doesn't do.

Dave Sherohman
See also: the [`assemble`](http://search.cpan.org/dist/Regexp-Assemble/eg/assemble) command line tool in that distribution which is not installed by default, and the improved [`Regexp::Assemble::Compressed`](http://p3rl.org/Regexp::Assemble::Compressed).
daxim