ansaurus

Question

Perl regex: how to know number of matches

Answer 1

+2 A:

If you do your matching in list context (i.e., basically assigning to a list), you get all of your matches and groupings in a list. Then you can just use that list in scalar context to get the number of matches.

Or am I misunderstanding the question?

Example:

my @list = /$my_regex/g;
if (@list)
{
  # do stuff
  print "Number of matches: " . scalar @list . "\n";
}

Platinum Azure 2010-08-31 15:05:01

Hmm, perhaps what I'm looking for. @list now contains every match (i.ex $1, $2, $3..) or..?

Lenny Benny 2010-08-31 15:31:00

Yup, `@list` has all the matches. (And of course, the size of the list is the number of matches!) Really, if you're going to do something with every capture/match, it's smartest to just iterate with `foreach` over that match list, kind of like how the C-style `for` is discouraged in Perl. :-)

Platinum Azure 2010-08-31 15:34:42

Excellent. Can I use my array_ref as a @list? Can you show me an example? Yes I'm going to do something with every capture/match, just as in http://code.google.com/p/codalyzer/source/browse/trunk/parser/Parser.pm

Lenny Benny 2010-08-31 17:33:25

..only now I store all regexes in a database, and was looking for the "best practice"-way to do it. :-)

Lenny Benny 2010-08-31 17:46:39

@Karl S Haugen: Can you please clarify what you mean? (I don't see an `array_ref` in the code above)

Platinum Azure 2010-08-31 19:02:17

Whoever downvoted me: Could I please have an explanation?

Platinum Azure 2010-08-31 19:06:00

I was thinking something like this: my @hits; for my $regex (@{$regexs_ref}) { LINE: for (@rawfile) { @hits = ($_ =~ /@$regex/); } }Of course I can change for to foreach :-). Is this what you were thinking?

Lenny Benny 2010-08-31 19:15:03

Perhaps I need to re-think this. Maybe just having each regex explicitly typed is the best way..? Like a do in the link above. Basically what I want to do: parse a log file, do something with the matches, and then insert accordingly into a database. Best solution??

Lenny Benny 2010-08-31 19:21:14

Yeah, that should basically work. I understand the `@{$regexs_ref}` part, but why do you have `/@$regex/` as your match pattern? Are you looking to match a literal `@` character before your regex? If so, you might want to escape the `@` sign, otherwise Perl might interpret that as an array dereference. (You don't need the brackets for a simple reference scalar, so Perl could get confused in the regex)

Platinum Azure 2010-08-31 19:23:13

Isn't that the dereferenced array ref I get from fetchall_arrayref()?I have like my $regexs_ref = $sth->fetchall_arrayref();

Lenny Benny 2010-08-31 19:35:44

http://stackoverflow.com/questions/3612756/perl-best-practices-file-parser-using-regexes-and-database-storage

Lenny Benny 2010-08-31 20:05:52

Yeah, I get the array reference part (for `$regexs_ref`. That's fine. It's the `/@$regex/` that I don't get. A regex pattern is a string, and therefore should be scalar. Basically, since you're doing two dereference steps, it feels like you are expecting `$regexs_ref` to carry a list of <b>more list references</b>, which in turn refer to lists. In reality you just have a reference to one list of regular expression patterns, if I'm understanding this correctly. Am I correct?

Platinum Azure 2010-08-31 20:50:02

Answer 2

+2 A:

You will need to keep track of that yourself. Here is one way to do it:

#!/usr/bin/perl

use strict;
use warnings;

my @regexes = (
    qr/b/,
    qr/a/,
    qr/foo/,
    qr/quux/,
);

my %matches = map { $_ => 0 } @regexes;
while (my $line = <DATA>) {
    for my $regex (@regexes) {
        next unless $line =~ /$regex/;
        $matches{$regex}++;
    }
}

for my $regex (@regexes) {
    print "$regex matched $matches{$regex} times\n";
}

__DATA__
foo
bar
baz

Chas. Owens 2010-08-31 15:08:42

Answer 3

A:

I'm not sure what is your need but this may help.

Use a counter.

my $counter;
for my $regex (@{$regexs_ref}) {
    LINE: for (@rawfile) {
        /@$regex/ && do {
            $counter++;
            next LINE;
        };
    }
}

M42 2010-08-31 15:12:08

Could the downvoter explain why ? This count the number of matching.

M42 2010-08-31 18:40:45

I didn't downvote you, but I'm tempted to because your solution doesn't take into account how many times each regex matched, which most people who use this sort of approach would probably want at a bare minimum.

Platinum Azure 2010-08-31 19:07:54

Answer 4

+1 A:

In CA::Parser's processing associated with matches for /$CA::Regex::Parser{Kills}{all}/, you're using captures $1 all the way through $10, and most of the rest use fewer. If by the number of matches you mean the number of captures (the highest n for which $n has a value), you could use Perl's special @- array (emphasis added):

@LAST_MATCH_START

@-

$-[0] is the offset of the start of the last successful match. $-[n] is the offset of the start of the substring matched by n-th subpattern, or undef if the subpattern did not match. Thus after a match against $_, $& coincides with substr $_, $-[0], $+[0] - $-[0]. Similarly, $n coincides with
substr $_, $-[n], $+[n] - $-[n]
if $-[n] is defined, and $+ coincides with
substr $_, $-[$#-], $+[$#-] - $-[$#-]
One can use $#- to find the last matched subgroup in the last successful match. Contrast with $#+, the number of subgroups in the regular expression. Compare with @+.

This array holds the offsets of the beginnings of the last successful submatches in the currently active dynamic scope. $-[0] is the offset into the string of the beginning of the entire match. The n-th element of this array holds the offset of the nth submatch, so $-[1] is the offset where $1 begins, $-[2] the offset where $2 begins, and so on.

After a match against some variable $var:

$` is the same as substr($var, 0, $-[0])

$& is the same as substr($var, $-[0], $+[0] - $-[0])

$' is the same as substr($var, $+[0])

$1 is the same as substr($var, $-[1], $+[1] - $-[1])

$2 is the same as substr($var, $-[2], $+[2] - $-[2])

$3 is the same as substr($var, $-[3], $+[3] - $-[3])

Example usage:

#! /usr/bin/perl

use warnings;
use strict;

my @patterns = (
  qr/(foo(bar(baz)))/,
  qr/(quux)/,
);

chomp(my @rawfile = <DATA>);

foreach my $pattern (@patterns) {
  LINE: for (@rawfile) {
    /$pattern/ && do {
      my $captures = $#-;
      my $s = $captures == 1 ? "" : "s";
      print "$_: got $captures capture$s\n"; 
    };
  }
}

__DATA__
quux quux quux
foobarbaz

Output:

foobarbaz: got 3 captures
quux quux quux: got 1 capture

Greg Bacon 2010-08-31 15:47:03

Thanks, this helped me a lot. =)

Lenny Benny 2010-08-31 17:34:54

@Karl You're welcome! I'm glad it helped.

Greg Bacon 2010-08-31 18:19:11

ansaurus

tags:

views:

answers:

Perl regex: how to know number of matches

@LAST_MATCH_START

@-

related questions