views:

737

answers:

3

In Python compiled regex patterns have a findall method that does the following:

Return all non-overlapping matches of pattern in string, as a list of strings. The string is scanned left-to-right, and matches are returned in the order found. If one or more groups are present in the pattern, return a list of groups; this will be a list of tuples if the pattern has more than one group. Empty matches are included in the result unless they touch the beginning of another match.

What's the canonical way of doing this in Perl? A naive algorithm I can think of is along the lines of "while a search and replace with the empty string is successful, do [suite]". I'm hoping there's a nicer way. :-)

Thanks in advance!

+9  A: 

Use the /g modifier in your match. From the perlop manual:

The "/g" modifier specifies global pattern matching--that is, matching as many times as possible within the string. How it behaves depends on the context. In list context, it returns a list of the substrings matched by any capturing parentheses in the regular expression. If there are no parentheses, it returns a list of all the matched strings, as if there were parentheses around the whole pattern.

In scalar context, each execution of "m//g" finds the next match, returning true if it matches, and false if there is no further match. The position after the last match can be read or set using the pos() function; see "pos" in perlfunc. A failed match normally resets the search position to the beginning of the string, but you can avoid that by adding the "/c" modifier (e.g. "m//gc"). Modifying the target string also resets the search position.

Chris Jester-Young
D'oh -- of course! I should have realized this coming from Vim land.
cdleary
+5  A: 

To build on Chris' response, it's probably most relevant to encase the //g regex in a while loop, like:

my @matches;
while ( 'foobarbaz' =~ m/([aeiou])/g )
{
    push @matches, $1;
}

Pasting some quick Python I/O:

>>> import re
>>> re.findall(r'([aeiou])([nrs])','I had a sandwich for lunch')
[('a', 'n'), ('o', 'r'), ('u', 'n')]

To get something comparable in Perl, the construct could be something like:

my $matches = [];
while ( 'I had a sandwich for lunch' =~ m/([aeiou])([nrs])/g )
{
    push @$matches, [$1,$2];
}

But in general, whatever function you're iterating for, you can probably do within the while loop itself.

kyle
But what about `@matches = 'I had a sandwich for lunch' =~ m/([aeiou])([nrs])/g`? Sure, you get a flattened array, but then you can splice that off two apiece (in this case). :-)
Chris Jester-Young
Ah-hm. The beauty of Perl is there's always another way! I'm glad I said, "could be something like" :)
kyle
+2  A: 

Nice beginner reference with similar content to @kyle's answer: Perl Tutorial: Using regular expressions

cdleary