views:

56

answers:

1

I am trying to write a regular expression that can parse the text between < p >< /p > tags. There will be up to 3 lines of text in a row. I thought this might be possible using the (?= search ahead feature.

The code that I am currently using to get one line is as follows.

<p>([^']*?)<[/]p

Is it possible to have one regular expression that can get the text between multiple rows of tags? Each line would need to be in its own group.

An example would be

 <p>The</p>
 <p>Grey</p>
 <p>Fox</p>
+1  A: 

First, this would be easy using the Html Agility Pack and you'd get a more robust solution.

But you can do it with regex in certain situations if you're 100% in control of the format and the input is coming from a trusted source:

Match match = Regex.Match(html, @"(?:<p>(.*?)</p>\s*)+", RegexOptions.Singleline);
if (match.Success)
{
    foreach (Capture line in match.Groups[1].Captures)
        Console.WriteLine(line.Value);
}

Output:

The
Grey
Fox
Mark Byers
Why the call to `OfType`?
Martinho Fernandes
Accident. Left over from testing. (I did actually test it, honest!)
Mark Byers