This must be the 20th duplicate or so, here is one: Looking for C# HTML parser
I'm looking for an open source, fast, w3c-equivalent html/xhtml parser for C# without native dlls. Thanks.
This must be the 20th duplicate or so, here is one: Looking for C# HTML parser
I'm looking for an open source, fast, w3c-equivalent html/xhtml parser for C# without native dlls. Thanks.
Just Googleling i found this:
Free open source HTML parser all in .NET
http://www.majestic12.co.uk/projects/html_parser.php
It seems to be very complete and the best of all: FREE
From the site:
Free .NET HTML parser (C#) is an open source high-performance .NET C# module that was created to parse HTML for links, indexing and other purposes. Full source code (~5k lines) is available under BSD license (this means you can use it in your commercial applications). This cross-platform code is verified to run very well under Mono. The parser is 100% self-contained managed code that does not depend on any external DLLs apart from core .NET libraries. We use this parser to process well over 3 TB of HTML every day.
Try SgmlReader. Not only Sgml, as the name suggests, it is able to parse HTML as well. It converts even malformed HTML to a XmlDocument object, which is then very easy to get attributes, values from.