views:

404

answers:

1

I'm looking for libraries to parse HTML to extract links, forms, tags etc.

LGPL or any other commercial development friendly licenses are preferable.

Have you got any experience with one of this libraries? Or could you recommend another similar library?

+6  A: 

The HTML Agility Pack has examples of exactly this type of thing, and uses xpath for familiar queries - for example (from home page), to find all links is simply:

foreach(HtmlNode link in doc.DocumentElement.SelectNodes("//a@href")) {
    //...
}
Marc Gravell
HTML Agility Pack is awesome, I also recommend it.
Matt Olenik
Agreed. We used this in a production environment, where we parsed approximately 50,000 (X)HTML files/hr, for a couple years straight. Worked great.
Chris