views:

94

answers:

4

Hi, I'm looping through a series of regexes and matching it against lines in a file, like this:

for my $regex (@{$regexs_ref}) {
    LINE: for (@rawfile) {
        /@$regex/ && do {
            # do something here
            next LINE;
        };
    }
}

Is there a way for me to know how many matches I've got (so I can process it accordingly..)?

If not maybe this is the wrong approach..? Of course, instead of looping through every regex, I could just write one recipe for each regex. But I don't know what's the best practice?

+2  A: 

If you do your matching in list context (i.e., basically assigning to a list), you get all of your matches and groupings in a list. Then you can just use that list in scalar context to get the number of matches.

Or am I misunderstanding the question?

Example:

my @list = /$my_regex/g;
if (@list)
{
  # do stuff
  print "Number of matches: " . scalar @list . "\n";
}
Platinum Azure
Hmm, perhaps what I'm looking for. @list now contains every match (i.ex $1, $2, $3..) or..?
Lenny Benny
Yup, `@list` has all the matches. (And of course, the size of the list is the number of matches!) Really, if you're going to do something with every capture/match, it's smartest to just iterate with `foreach` over that match list, kind of like how the C-style `for` is discouraged in Perl. :-)
Platinum Azure
Excellent. Can I use my array_ref as a @list? Can you show me an example? Yes I'm going to do something with every capture/match, just as in http://code.google.com/p/codalyzer/source/browse/trunk/parser/Parser.pm
Lenny Benny
..only now I store all regexes in a database, and was looking for the "best practice"-way to do it. :-)
Lenny Benny
@Karl S Haugen: Can you please clarify what you mean? (I don't see an `array_ref` in the code above)
Platinum Azure
Whoever downvoted me: Could I please have an explanation?
Platinum Azure
I was thinking something like this: my @hits; for my $regex (@{$regexs_ref}) { LINE: for (@rawfile) { @hits = ($_ =~ /@$regex/); } }Of course I can change for to foreach :-). Is this what you were thinking?
Lenny Benny
Perhaps I need to re-think this. Maybe just having each regex explicitly typed is the best way..? Like a do in the link above. Basically what I want to do: parse a log file, do something with the matches, and then insert accordingly into a database. Best solution??
Lenny Benny
Yeah, that should basically work. I understand the `@{$regexs_ref}` part, but why do you have `/@$regex/` as your match pattern? Are you looking to match a literal `@` character before your regex? If so, you might want to escape the `@` sign, otherwise Perl might interpret that as an array dereference. (You don't need the brackets for a simple reference scalar, so Perl could get confused in the regex)
Platinum Azure
Isn't that the dereferenced array ref I get from fetchall_arrayref()?I have like my $regexs_ref = $sth->fetchall_arrayref();
Lenny Benny
http://stackoverflow.com/questions/3612756/perl-best-practices-file-parser-using-regexes-and-database-storage
Lenny Benny
Yeah, I get the array reference part (for `$regexs_ref`. That's fine. It's the `/@$regex/` that I don't get. A regex pattern is a string, and therefore should be scalar. Basically, since you're doing two dereference steps, it feels like you are expecting `$regexs_ref` to carry a list of <b>more list references</b>, which in turn refer to lists. In reality you just have a reference to one list of regular expression patterns, if I'm understanding this correctly. Am I correct?
Platinum Azure
+2  A: 

You will need to keep track of that yourself. Here is one way to do it:

#!/usr/bin/perl

use strict;
use warnings;

my @regexes = (
    qr/b/,
    qr/a/,
    qr/foo/,
    qr/quux/,
);

my %matches = map { $_ => 0 } @regexes;
while (my $line = <DATA>) {
    for my $regex (@regexes) {
        next unless $line =~ /$regex/;
        $matches{$regex}++;
    }
}

for my $regex (@regexes) {
    print "$regex matched $matches{$regex} times\n";
}

__DATA__
foo
bar
baz
Chas. Owens
A: 

I'm not sure what is your need but this may help.

Use a counter.

my $counter;
for my $regex (@{$regexs_ref}) {
    LINE: for (@rawfile) {
        /@$regex/ && do {
            $counter++;
            next LINE;
        };
    }
}
M42
Could the downvoter explain why ? This count the number of matching.
M42
I didn't downvote you, but I'm tempted to because your solution doesn't take into account how many times each regex matched, which most people who use this sort of approach would probably want at a bare minimum.
Platinum Azure
+1  A: 

In CA::Parser's processing associated with matches for /$CA::Regex::Parser{Kills}{all}/, you're using captures $1 all the way through $10, and most of the rest use fewer. If by the number of matches you mean the number of captures (the highest n for which $n has a value), you could use Perl's special @- array (emphasis added):

@LAST_MATCH_START

@-

$-[0] is the offset of the start of the last successful match. $-[n] is the offset of the start of the substring matched by n-th subpattern, or undef if the subpattern did not match. Thus after a match against $_, $& coincides with substr $_, $-[0], $+[0] - $-[0]. Similarly, $n coincides with

substr $_, $-[n], $+[n] - $-[n]

if $-[n] is defined, and $+ coincides with

substr $_, $-[$#-], $+[$#-] - $-[$#-]

One can use $#- to find the last matched subgroup in the last successful match. Contrast with $#+, the number of subgroups in the regular expression. Compare with @+.

This array holds the offsets of the beginnings of the last successful submatches in the currently active dynamic scope. $-[0] is the offset into the string of the beginning of the entire match. The n-th element of this array holds the offset of the nth submatch, so $-[1] is the offset where $1 begins, $-[2] the offset where $2 begins, and so on.

After a match against some variable $var:

  • $` is the same as substr($var, 0, $-[0])
  • $& is the same as substr($var, $-[0], $+[0] - $-[0])
  • $' is the same as substr($var, $+[0])
  • $1 is the same as substr($var, $-[1], $+[1] - $-[1])
  • $2 is the same as substr($var, $-[2], $+[2] - $-[2])
  • $3 is the same as substr($var, $-[3], $+[3] - $-[3])

Example usage:

#! /usr/bin/perl

use warnings;
use strict;

my @patterns = (
  qr/(foo(bar(baz)))/,
  qr/(quux)/,
);

chomp(my @rawfile = <DATA>);

foreach my $pattern (@patterns) {
  LINE: for (@rawfile) {
    /$pattern/ && do {
      my $captures = $#-;
      my $s = $captures == 1 ? "" : "s";
      print "$_: got $captures capture$s\n"; 
    };
  }
}

__DATA__
quux quux quux
foobarbaz

Output:

foobarbaz: got 3 captures
quux quux quux: got 1 capture
Greg Bacon
Thanks, this helped me a lot. =)
Lenny Benny
@Karl You're welcome! I'm glad it helped.
Greg Bacon