ansaurus

Question

Answer 1

+3 A:

Check out the HTML Agility Pack to do all sorts of HTML manipulation

It gives you an interface somewhat similar to the XmlDocument XML handling interface:

 HtmlDocument doc = new HtmlDocument();
 doc.Load("file.htm");

 HtmlNode bodyNode = doc.DocumentNode.SelectSingleNode("/html/body");

 if(bodyNode != null)
 {
    // do something
 }

marc_s 2010-10-27 20:34:14

Answer 2

+2 A:

You may take a look at SgmlReader and HTML Agility Pack.

Darin Dimitrov 2010-10-27 20:34:32

That URL to SgmlReader leads to a very old version that hasn't been touched in years. The guys maintaining SgmlReader these days are MindTouch. I would recommend SgmlReader over HtmlAgilityPack due to its lower level approach and active maintenance. http://developer.mindtouch.com/en/docs/SgmlReader

asbjornu 2010-10-27 21:02:06

If your HTML isn't wellformed XHTML I think you'll find that SgmlReader (and yeah use the mindtouch version as in the comment above) is your best bet.

nrkn 2010-10-27 23:19:36

@asbjomu - Looking through the conversion examples on the mindtouch site, I can't find a single one where SgmlReader produces a DOM that matches what browsers do. I don't know whether HTML Agility Pack is any better, but I wasn't impressed.

Alohci 2010-10-27 23:33:13

@Alohci I agree that SgmlReader isn't up to par with browser parsers, but there aren't many alternatives native to C# that does it better. HtmlAgilityPack surely doesn't.

asbjornu 2010-10-30 13:18:21

Answer 3

A:

Its easy enough to pull the page code into a string, and simply search for the occurrence of the string "<body" and the string "</body", and just do a little math to get your value...

Dutchie432 2010-10-27 20:36:10

Answer 4

A:

If it happens to be XHTML, then you could use XPath.

Bryan 2010-10-27 20:58:12

Answer 5

A:

Use XML methods, XPATH (if you want ONLY specified node). For more advanced manipulation with html use HTML Agility pack.

Tomas Voracek 2010-10-27 21:01:10

ansaurus

tags:

views:

answers:

Read <body> tag of html file using c#

related questions