I'm nearly done with a trackback system for my website, but have one last niggling regular expression I just can't get right.
What I'm after is an excerpt of the referring page, where I'm defining the most relevant excerpt as:
The first paragraph (marked by <p></p>
tags) that follows either an <h1></h1>
, <h2></h2>
or <h3></h3>
in the HTML Source of the page.
For instance, I can successfully fetch the <title></title>
tag for the HTML as follows:
Regex reTITLE = new Regex( @"(?<=<title.*>)([\s\S]*)(?=</title>)",
RegexOptions.IgnoreCase );
Match match = reTITLE.Match( strHTMLSource );
if (match.Success)
{
strReferringPageTitle = match.Value.Trim( );
}
My question -- what Regular Expression can I use to fetch the string described in the first part of my post?
PS: I love StackOverflow and this community -- great job, Joel & Co.!