ansaurus

Question

Regex lookahead, lookbehind and atomic groups

Answer 1

+1 A:

You can find a detailed description by following these links:

Mark Byers 2010-06-04 11:01:04

Answer 2

A:

Have a look at the RegexBuddy help under 'Regular Expression Tutorial' > 'Lookahead and Lookbehind'. This details what the assertions do and will give you some examples.

Ti M 2010-06-04 11:04:37

Answer 3

A:

given the string foobarbarfoo

bar(?!bar) finds the second bar in the string.
bar(?=bar) finds the first bar in the string.
(?<=foo)bar finds the firs bar in the string
(?<!foo)bar finds the second bar in the string

you can also combine them

(?<=foo)bar(?=bar)

skyfoot 2010-06-04 11:06:12

Answer 4

+4 A:

Lookarounds are zero width assertions. The check for a regex (towards right or left of the current position - based on ahead or behind), succeeds or fails when a match is found (based on if it is positive or negative) and discards the matched portion. They don't consume any character - the matching for regex following them (if any), will start at the same cursor position.

Read regular-expression.info for more details.

Positive lookahead:

Syntax:

(?=REGEX_1)REGEX_2

Match only if REGEX_1 matches; after matching REGEX_1, the match is discarded and searching for REGEX_2 starts at the same position.

example:

(?=[a-z0-9]{4}$)[a-z]{1,2}[0-9]{2,3}

REGEX_1 is [a-z0-9]{4}$ which matches four alphanumeric chars followed by end of line.
REGEX_2 is [a-z]{2}[0-9]{2} which matches one or two letters followed by two or three digits.

REGEX_1 makes sure that the length of string is indeed 4, but doesn't consume any characters so that search for REGEX_2 starts at the same location. Now REGEX_2 makes sure that the string matches some other rules. Without look-ahead it would match strings of length three or five.

Negative lookahead

Syntax:

(?!REGEX_1)REGEX_2

Match only if REGEX_1 does not match; after checking REGEX_1, the search for REGEX_2 starts at the same position.

example:

(?!.*\bFWORD\b)\w{10,30}$

The look-ahead part checks for the FWORD in the string and fails if it finds it. If it doesn't find FWORD, the look-ahead succeeds and the following part verifies that the string's length is between 10 and 30 and that it contains only word characters a-zA-Z0-9_

Look-behind is similar to look-ahead: it just looks behind the current cursor position. Some regex flavors like javascript doesn't support look-behind assertions. And most flavors that support it (PHP, Python etc) require that look-behind portion be fixed length.

Atomic groups basically discards/forgets the subsequent tokens in the group once a token matches. Check this page for examples of atomic groups

Amarghosh 2010-06-04 11:23:21

Great explanation, Respect

Spidfire 2010-06-04 11:37:27

ansaurus

tags:

views:

answers:

Regex lookahead, lookbehind and atomic groups

related questions