Duplicate: Looking for C# HTML parser. Please close.
Can you recommend me a library for reading HTML files as XML in .NET? I'd actually prefer to deal with XML objects rather than text. Ideally, it must fix HTML formatting errors.
Duplicate: Looking for C# HTML parser. Please close.
Can you recommend me a library for reading HTML files as XML in .NET? I'd actually prefer to deal with XML objects rather than text. Ideally, it must fix HTML formatting errors.
http://www.codeplex.com/htmlagilitypack
Duplicate? http://stackoverflow.com/questions/100358/looking-for-c-html-parser
You may want to rethink this. The two are not equal.
a great example of this is self closing tags.
XML standard indicates that a self closing tag looks like the following:
<br/>
while html standards has non-content tags as single tags
<br>
<link rel="...">
In html, using the xml syntax actually is a violation, as />
has a different meaning.
There are more examples of these issues in the following article.