views:

922

answers:

3
+4  Q: 

.NET Html Parser

This must be the 20th duplicate or so, here is one: Looking for C# HTML parser

I'm looking for an open source, fast, w3c-equivalent html/xhtml parser for C# without native dlls. Thanks.

+7  A: 

HTML Agility Pack

Francis B.
A: 

Just Googleling i found this:

Free open source HTML parser all in .NET

http://www.majestic12.co.uk/projects/html_parser.php

It seems to be very complete and the best of all: FREE

From the site:

Free .NET HTML parser (C#) is an open source high-performance .NET C# module that was created to parse HTML for links, indexing and other purposes. Full source code (~5k lines) is available under BSD license (this means you can use it in your commercial applications). This cross-platform code is verified to run very well under Mono. The parser is 100% self-contained managed code that does not depend on any external DLLs apart from core .NET libraries. We use this parser to process well over 3 TB of HTML every day.

backslash17
+3  A: 

Try SgmlReader. Not only Sgml, as the name suggests, it is able to parse HTML as well. It converts even malformed HTML to a XmlDocument object, which is then very easy to get attributes, values from.

Tamás Szelei