views:

68

answers:

2

For example,the regex below will cause failure reporting lookbehind assertion is not fixed length:

#(?<!(?:(?:src)|(?:href))=["\']?)((?:https?|ftp)://[^\s\'"<>()]+)#S

Such kind of restriction doesn't exist for lookahead.

+2  A: 

First of all, this isn't true for all regular expression libraries (like .NET).

For PCRE, the reason appears to be:

The implementation of lookbehind assertions is, for each alternative, to temporarily move the current position back by the fixed width and then try to match.

(at least, according to http://www.autoitscript.com/autoit3/pcrepattern.html).

mbeckish
Why not use the same algorithm for `lookahead` and `lookbehind`? Isn't the prototype the same?
wamp
+2  A: 

Lookahead and lookbehind aren't nearly as similar as their names imply. The lookahead expression works exactly the same as it would if it were a standalone regex, except it's anchored at the current match position and it doesn't consume what it matches.

Lookbehind is a whole different story. Starting at the current match position, it steps backward through the text one character at a time, attempting to match its expression at each position. In cases where no match is possible, the lookbehind has to go all the way to the beginning of the text (one character at a time, remember) before it gives up. Compare that to the lookahead expression, which gets applied exactly once.

This is a gross oversimplification, of course, and not all flavors work that way, but you get the idea. The way lookbehinds are applied is fundamentally different from (and much, much less efficient than) the way lookaheads are applied. It only makes sense to put a limit on how far back the lookbehind has to look.

Alan Moore