They are called lookarounds; they allow you to assert if a pattern matches or not, without actually making the match. There are 4 basic lookarounds:
- Positive lookarounds: see if we CAN match the
pattern
...
(?=pattern)
- ... to the right of current position (look ahead)
(?<=pattern)
- ... to the left of current position (look behind)
- Negative lookarounds - see if we can NOT match the
pattern
(?!pattern)
- ... to the right
(?<!pattern)
- ... to the left
As an easy reminder, for a lookaround:
=
is positive, !
is negative
<
is look behind, otherwise it's look ahead
References
But why use lookarounds?
One might argue that lookarounds in the pattern above aren't necessary, and #([^#]+)#
will do the job just fine (extracting the string captured by \1
to get the non-#
).
Not quite. The difference is that since a lookaround doesn't match the #
, it can be "used" again by the next attempt to find a match. Simplistically speaking, lookarounds allow "matches" to overlap.
Consider the following input string:
and #one# and #two# and #three#four#
Now, #([a-z]+)#
will give the following matches (as seen on rubular.com):
and #one# and #two# and #three#four#
\___/ \___/ \_____/
Compare this with (?<=#)[a-z]+(?=#)
, which matches:
and #one# and #two# and #three#four#
\_/ \_/ \___/ \__/
Unfortunately this can't be demonstrated on rubular.com, since it doesn't support lookbehind. However, it does support lookahead, so we can do something similar with #([a-z]+)(?=#)
, which matches (as seen on rubular.com):
and #one# and #two# and #three#four#
\__/ \__/ \____/\___/
References