tags:

views:

216

answers:

6

I expected this to print "[b]" but it prints "[]":

$x = "abc";
$x =~ /(b*)/;
print "[$1]";

If the star is replaced with a plus, it acts as I expect. Aren't both plus and star supposed to be greedy?

ADDED: Thanks everyone for pointing out (within seconds, it seemed!) that "b*" matches the empty string, the first occurrence of which is before the string even starts. So greediness is not the issue at all. It matches the empty string before even getting to the first 'b'.

+3  A: 

The regex will match a(backtrack) (which is an empty value since the regex backtracked) and end there. With the + quantifier it doesn't match a or c so the value of $1 becomes b.

Blixt
Not quite correct. It matches and terminates at `a`, not `c`.
chaos
Ah right, I was thinking of it as a global match. Corrected.
Blixt
+10  A: 

The pattern will match and return the first time b* is true, i.e. it will perform a zero-width match at a. To more clearly illustrate what's going on, do this:

$x = "zabc";
$x =~ /(.b*)/;
print "[$1]";
chaos
+3  A: 

The regex matches at the earliest point in the string that it can. In the case of 'abc' =~ /(b*)/, that point is right at the beginning of the string where it can match zero b's. If you had tried to match 'bbc', then you would have printed:

[bb]

Adrian Pronk
+10  A: 

It is greedy, but b* will match the empty string. anything* will always match the empty string so,

  "abc"
  /\
     --- matches the empty string here.

If you print $' you'll see it's abc, which is the rest of the string after the match. Greediness just means that in the case of "bbb", you get "bbb", and not "b" or "bb".

Logan Capaldo
I see. So greediness is not the issue at all. It never has a chance to greedily match the string of b's since it matches the empty string at the very beginning of string before it even gets to the b's.
dreeves
You are correct, sir.
chaos
A: 

A * at the end of a pattern is almost always not what you want. We even have this as a trick question in Learning Perl to illustrate just this problem.

brian d foy
A: 

Matching as early as possible has a higher priority than the length of the match (AFAIR this is the case of Perl's regex matching engine, which is a NFA). Therefore a zero length match at the start of the string is more desirable than a longer match later in the string.

For more information search for "DFA vs NFA" in this article about regex matching engines.

kixx