tags:

views:

2865

answers:

4

I'm looking for a multiline regex that will match occurrences after a blank line. For example, given a sample email below, I'd like to match "From: Alex". ^From:\s*(.*)$ works to match any From line, but I want it to be restricted to lines in the body (anything after the first blank line).

Received: from a server
Date: today
To: Ted
From: James
Subject: [fwd: hi]

fyi

----- Forwarded Message -----
To: James
From: Alex
Subject: hi

Party!

A: 

Writing complicated regular expressions for such jobs is a bad idea IMO. It's better to combine several simple queries. For example, first search for "\r\n\r\n" to find the start of the body, then run the simple regex over the body.

Sebastian Redl
+2  A: 

I'm not sure of the syntax of C# regular expressions but you should have a way to anchor to the beginning of the string (not the beginning of the line such as ^). I'll call that "\A" in my example:

\A.*?\r?\n\r?\n.*?^From:\s*([^\r\n]+)$

Make sure you turn the multiline matching option on, however that works, to make "." match \n

Loren Segal
Thanks. What I needed was to turn on Singleline mode too, and use .*? instead of .*.
A: 

This is using a look-behind assertion. Group 1 will give you the "From" line, and group 2 will give you the actual value ("Alex", in your example).

(?<=\n\n).*(From:\s*(.*?))$
gregmac
A: 
\s{2,}.+?(.+?From:\s(?<Sender>.+?)\s)+?

The \s{2,} matches at least two whitespace characters, meaning your first From: James won't hit. Then it's just a matter of looking for the next "From:" and start capturing from there.

Use this with RegexOptions.SingleLine and RegexOptions.ExplicitCapture, this means the outer group won't hit.

Teetow