I have some HTML

 <p id="errorMessage">System.Web.HttpException: Path '/DynamicData/DimOrganisations/List.aspx' was not found.</p>
 <p>Generated: Tue, 29 Sep 2009 18:04:18 GMT</p>

I want to search through my HTMl for the tag

p id="errorMessage"

And then take out the data with the html. So run something to look for the tag and get the data out between the start and end tag. So by the end I get:

System.Web.HttpException: Path '/DynamicData/DimOrganisations/List.aspx' was not found.

Can anyone help. I am using C# 2008



+4  A: 

You could use the HTML Agility Pack to parse the HTML and find the elements/attributes that you need.

+1: It's a powerful library, and using a parser/DOM is the best way to solve your problem.

How you go about solving this problem will depend on how general you want the solution to be. If the HTML you're examining was created by your application and you can guarantee its format, then you can probably solve the problem with a simple regular expression. That is, if you always have:

<p id="errorMessage>Error message goes here.</p>

Then a regular expression that looks for that pattern is very simple to write, test, and maintain.

But if you allow arbitrary HTML tags in the error message, then you'll have to go with something much more complex, like an HTML parser.

If this is an internal debugging tool, I would strongly suggest that you go with the simpler method. Format the HTML for your error messages so that it's easy to parse using the simplest method possible.

Jim Mischel