Is there a .Net class for reading and manipulating html other than System.Windows.Forms.HtmlDocument.
If not, are there any open source libraries for this.
Is there a .Net class for reading and manipulating html other than System.Windows.Forms.HtmlDocument.
If not, are there any open source libraries for this.
I would do something like this if it XHTML compliant:
System.Xml.XmlDocument xDoc = new System.Xml.XmlDocument();
xDoc.LoadXml(html);
And edit it that way. If it needs some cleaning up(XHtml Conversion) you can use HtmlTidy or Ntidy. Additionally, you can use this HTMLTidy wrapper example below:
string input = "<p>broken html<br <img src=test></div>";
HtmlTidy tidy = new HtmlTidy()
string output = tidy.CleanHtml(input, HtmlTidyOptions.ConvertToXhtml);
XmlDocument doc = new XmlDocument();
doc.LoadXml(output);
EDIT above will be converted to XHtml
Why does you like not System.Windows.Forms.HtmlDocument and Microsoft.mshtml ?
You could use the MSHTML library. However, it is COM/ActiveX, but if you are using Visual Studio, it will create a managed wrapper for you automatically.
you can always use the LiteralControl:
PlaceHolder.Controls.Add(new LiteralControl("<div>some html</div>"));
It seems that the best option for parsing Html in .Net apps is to use the Html Agility Pack library found on codeplex. This provides full DOM access to the HTML and is very straightforward to use.