views:

210

answers:

1

I'm looking for a fast, lightweight open-source HTML parser -- something along the lines of a non-validating SAX parser (except, of course, for HTML).

The answers to this question cover a parser that generates a DOM (don't want that), and these answers suggest conforming the HTML to XML before sending it to Xerxes (can't do that in my case).

Any suggestions?

+1  A: 

Maybe this is something for you: http://www.codeproject.com/kb/library/GomzyHTMLReader.aspx

cyphorious