views:

71

answers:

2
+1  Q: 

Html string reader

Hi all,

I need to load HTML and parse it, I think that it should be something simple, I pass a string with a "HTML" it reads the string in a Dom like object, so I can search and parse the content of the HTML, facilitating scraping and things like that.

Do you guys know about any thing like that.

Thanks

+9  A: 

HTML Agility Pack

Similar API to XmlDocument, for example (from the examples page):

 HtmlDocument doc = new HtmlDocument();
 doc.Load("file.htm");
 foreach(HtmlNode link in doc.DocumentElement.SelectNodes("//a[@href"])
 {
    HtmlAttribute att = link["href"];
    att.Value = FixLink(att);
 }
 doc.Save("file.htm");

(you should also be able to use LoadHtml to load a string of html, rather than from a path)

Marc Gravell
Do you know whether that compiles against the Silverlight libraries?
AnthonyWJones
@Anthony - no clue whatsoever, sorry.
Marc Gravell
I don't have access to this things in WCF. I had to do a line by line parse. very dumb and hard.
Oakcool
+1  A: 

If you're running in-browser, you should be able to use the Html DOM Bridge, load the HTML into it, and walk the DOM Tree for that.

JustinAngel