tags:

views:

206

answers:

3

I'm trying to write a regex to parse a (seemingly very simple) piece of text like this.

some stuff
First name: John
Last name: Smith
more stuff

I want to capture the first and last name, so I tried a regex like this:

(?<=First name:\s*)(?<FirstName>\w+)(?<=\s*Last name:\s*)(?<LastName>\w+)

This fails to find a match. Each part (first name and last name) works individually, but they don't work together. Also, the following works

(?<=John\s*Last name:\s*)(?<LastName>\w+)

but when I move "John" out of the non-matching group...

John(?<=\s*Last name:\s*)(?<LastName>\w+)

... it doesn't match!

What am I doing wrong here?

A: 

Just realised that I probably don't need the look-behind, because the following works:

First name:\s*(?<FirstName>\w+)\s*Last name:\s*(?<LastName>\w+)

Nevertheless, I'd be interested to know why it doesn't work with the look-behind for future reference.

Evgeny
A: 

I think you need to make sure that newlines are matchable in whatever regex language you're using.

In Python, this means passing re.DOTALL to re.compile() or whatever re function you're using. In Perl, add s after the last slash.

a paid nerd
+1  A: 

Since look-behind assertions are zero-width (i.e. they don't consume any characters), the FirstName capture will match/capture whatever is after "First name:", in this case "John". After this first match, the position in the target string will be immediately after "John". But since the next part of the regex is another look-behind, the regex will look to see if what immediately precedes its current position matches your look-behind text, in this case "Last name:". Since it is actually preceded by "John", the whole regex fails and never even gets to "Smith".

Bryan