views:

103

answers:

4

Hi,

I have a C# application that receive a html file, I want to parse it and validate it, that on output it will return a list of errors or that my html is valid.

Has anyone any idea how can I do this?

Thanks.

A: 

This is relevant to your question:

http://stackoverflow.com/questions/100358/looking-for-c-html-parser

Dave
Not really. That is looking for something that can recover from errors, not test for them.
David Dorward
Yes it is, the errors can be recovered from many of the options listed within.
Dave
A bit more detail about them would be nice, I don't think hunting through the answers reveals that information, so people would have to examine the documentation for each of them in turn.
David Dorward
A: 

There is an obscure DLL in the framework version 1.0 (!) Microsoft.mshtml.dll and that is the only way in the framework to deal with DOM. If HTML is XHTML and a valid XML, then you canuse XML but other wise this is the only chance.

Aliostad
I'd be amazed that that was the *only* way to deal with DOM.
David Dorward
In the framework Mr negative...
Aliostad
hmmm, explain me how can you can validate an very elaborate html file with xml. I thought about that too, and I think it's not the best way.
Jeff Norman
In what framework? Nobody mentioned a framework. (Oh, and must we resort to name calling?)
David Dorward
It's not so obscure, it the PIA for Internet Explorer. Not part of the framework, it's a COM interop library. Whether IE is a good validator for HTML is, ahem, debatable.
Hans Passant
+3  A: 

I'd run a local instance of the W3C Markup Validation service and communicate with it via the API

David Dorward
I was not aware there was an API for this, nice find.
Dave
+1  A: 

You can use HTML Tidy. There is a wrapper for .NET called TidyManaged

gcores
TidyManaged does not give any functional dll
Jeff Norman
Have you tried here? http://github.com/markbeaton/TidyManaged/downloads
gcores