I'm looking for a .NET library that can generate a clean Xml tree, ideally System.Xml.XmlDocument, from invalid HTML code. I.E. it should make the kind of best effort guesses, repairs, and substitutions browsers do when confronted with this situation, and generate a pretend XmlDocument. The library should also be well-maintained. :)
I realize this is a lot (too much?) to ask, and I would appreciate any useful leads. There seem to be a fair number of implementations of this for Java, but I would rather not generate my own bindings. So far for .NET I have found http://www.majestic12.co.uk/projects/html_parser.php and http://users.rcn.com/creitzel/tidy.html#dotnet, and http://sourceforge.net/projects/tidyfornet .
I have not yet built or tested any of these, but from the (sparse) docs and rare updates they do not seem like they have what I'm looking for. So what recommendations do you have, either among these choices, or from your past experience.