tags:

views:

1088

answers:

5

Hi,
I thought I understood Perl RE to a reasonable extent, but this is puzzling me:

#!/usr/bin/perl
use strict;
use warnings;

my $test = "'some random string'";

if($test =~ /\'?(.*?)\'?/) {
       print "Captured $1\n";
       print "Matched $&";
}
else {
       print "What?!!";
}

prints

Captured
Matched '

It seems it has matched the ending ' alone, and so captured nothing.
I would have expected it to match the entire thing, or if it's totally non-greedy, nothing at all (as everything there is an optional match).
This in between behaviour baffles me, can anyone explain what is happening?
Thanks, Sundar.

+2  A: 

pattern? is greedy, if you want it to be non-greedy you must say pattern??:

#!/usr/bin/perl
use strict;
use warnings;

my $test = "'some random string'";

if($test =~ /\'?(.*?)\'?/) {
       print "Captured [$1]\n";
       print "Matched  [$&]\n";
}
if($test =~ /\'??(.*?)\'??/) {
       print "Captured [$1]\n";
       print "Matched  [$&]\n";
}

from perldoc perlre:

The following standard quantifiers are recognized:

*      Match 0 or more times
+      Match 1 or more times
?      Match 1 or 0 times
{n}    Match exactly n times
{n,}   Match at least n times
{n,m}  Match at least n but not more than m times

By default, a quantified subpattern is "greedy", that is, it will match as many times as possible (given a particular starting location) while still allowing the rest of the pattern to match. If you want it to match the minimum number of times possible, follow the quantifier with a "?". Note that the meanings don’t change, just the "greediness":

*?     Match 0 or more times
+?     Match 1 or more times
??     Match 0 or 1 time
{n}?   Match exactly n times
{n,}?  Match at least n times
{n,m}? Match at least n but not more than m times
Chas. Owens
Nope, perl regex is greedy by default and ? makes them non-greedy.
Ed Guiness
Um, read what I said again, pattern? is greedy (because that is the default), to get non-greedy you must say pattern??.
Chas. Owens
From the perldoc you quoted: If you want it to match the minimum number of times possible, follow the quantifier with a "?".
Ed Guiness
yes, you must follow the _quantifier_ with ?. The pattern is not the quantifier. The quantifier in this case is ? which is the same as the {0,1} quantifier. To get non-greedy optional matches you must say pattern??, that is pattern, quantifier (in this case ?), and then non-greedy ?.
Chas. Owens
It is right there in the freaking perldoc I quoted, forth line from the bottom!
Chas. Owens
you're confusing ?? (which means "0 or 1" ) with ? (which means "zero or more"). ?? does not mean "non-greedy", it means "zero or one"
Ed Guiness
@edg: No, x?? means match x zero or one times but non-greedily, just as x? means match x zero or one times but greedily.
Simon Nickerson
Edit: fixing some formatting
Chris Lutz
The first table in the quote is the quantifiers. ? means match 0 or 1 of the pattern the preceded it. The second table is the first table made non-greedy by the addition of a ? to the quantifier. In order to get a 0 or 1 match that is non-greedy you must say ?? or {0,1}?.
Chas. Owens
The questioner was confused by the fact that a match occurred at all because he/she thought that pattern? was non-greedy, I stated that pattern? is greedy and if you want it be non-greedy you must say pattern??, you proceed to disagree with me, and then state my point yourself.
Chas. Owens
Oh, wait, that was simonn, nevermind, you seem to still be confused.
Chas. Owens
@Chris Lutz, that is the way it is formatted in the perldoc, is there a reason you feel having everything flush against the side is better?
Chas. Owens
I honestly don't get it, what is there to be greedy in /'?/ ?? Match 1 or 0 times?? Match 0 or 1 timeSeems the same to me... The greedy/non-greedy comes only when you have things like * or + that can match any number of times.
sundar
I don't know how to format newlines in comments, but let me try:I honestly don't get it, what is there to be greedy in /'?/ ? ? Match 1 or 0 times?? Match 0 or 1 time Seems the same to me. The greedy/non-greedy comes only when you have things like * or + that can match any number of times.
sundar
@sundar: The ? is a shorthand for {0,1}. The ?? is a shorthand for {0,1}?. The former matches, if it can (=> geedy), the latter matches if it must (=> non-greedy).
Tomalak
@sundar, pattern? will match the pattern if it can, pattern?? will match the pattern only if it is necessary for the match to be successful. I will add an example to the answer.
Chas. Owens
I have difficulty coming up with a good example that demonstrates the usefulness of ??, I don't tend to use it.
Chas. Owens
+2  A: 

I think you mean something like:

/'(.*?)'/      // matches everything in single quotes

or

/'[^']*'/      // matches everything in single quotes, but faster

The singe quotes don't need to be escaped, AFAIK.

Tomalak
+1  A: 

Beware of making all elements of your regex optional (i.e. having all elements quantified with * or ? ). This lets the Perl regex engine match as much as it wants (even nothing), while still considering the match successful.

I suspect what you want is

/'(.*?)'/
kixx
+10  A: 

The \'? at the beginning and end means match 0 or 1 apostrophes greedily. (As another poster has pointed out, to make it non-greedy, it would have to be \'??)

The .*? in the middle means match 0 or more characters non-greedily.

The Perl regular expression engine will look at the first part of the string. It will match the beginning, but does so greedily, so it picks up the first apostrophe. It then matches non-greedily (so takes as little as it can) followed by an optional apostrophe. This is matched by the empty string.

Simon Nickerson
Thank you for the very clear explanation.
sundar
In other words, only the *beginning* apostrophe was matched, the rest of the regex matches the empty string.
+1  A: 

I would say the closest answer to what you are looking for is

/'?([^']*)'?/

So "get the single quote if it's there", "get anything and everything that's not a single quote", "get the last single quote if it's there".

Unless you want to match "'don't do this'" - but who uses an apostrophe in a single quote anyway (and gets away with it for long)? :)

Rini