views:

71

answers:

4

Why doesn't this look-behind assertion work when it's anchored to the front of the string? Run the following code and you'll see that the first test passes but the second, which varies only by the ^ anchor, fails.

use Test::More tests => 2;

my $s = '/123/456/hello';    
$s =~ s{(?<=/)\d+(?=/\d+/hello)}{0};  # unanchored
is($s, '/0/456/hello', 'unanchored'); # passes

$s = '/123/456/hello';
$s =~ s{^(?<=/)\d+(?=/\d+/hello)}{0}; # anchored
is($s, '/0/456/hello', 'anchored');   # fails

Moving the ^ into the look-behind assertion isn't an option for me (this is an extremely simplified example) but that does fix the problem. I've found an alternate way to do what I want, but I'm curious why this approach didn't work. I've tested this on perl 5.8.8 and perl 5.10.0.

+9  A: 

Remember that the assertion is zero-width and doesn't consume the characters it matches. So the anchor has to go inside the assertion, otherwise the whole expression doesn't match.

Mark Byers
Good point. It can't both be the start of the string and have a '/' preceding it.
Axeman
The look-behind assertion is looking behind the `\d+`, not the `^`. It doesn't match because the assertion doesn't consume the characters it matches, leaving the "effective regex" after the assertions are applied as `^\d+`, which of course doesn't match `/123/456/hello`
John Siracusa
+1  A: 

There's nothing before the front of the sting, so any nonempty lookbehind will fail when anchored with ^.

Charles
+4  A: 

I would guess it's because (?<= is positive look behind (not negative) and you can't have a character before the start of the string. If you are after negative look behind, you should be using (? < ! instead.

Cags
The positive/negative thing was an error in the question; fixed now.
John Siracusa
+6  A: 

(?<=/)\d+(?=/hello) on your string matches 456 as it is the only part of the string that both lookarounds will apply to. When you anchor your expression it no longer can match anything. A lookaround is zero width so your second pattern says "match one or more digits starting from the beginning of the string, where the preceding character is a slash", which obviously is not possible.

Daniel Vandersluis
all the answers are more or less saying the same thing but this one seems clearest
ysth