Just looking for a really easy way to clean up some HTML (possibly with embedded JavaScript). Tried two different HtmlTidy .NET ports and both and throwing exceptions...
Sorry, by "clean" I mean "indent". The HTML is not malformed, at all. It's XHTML strict.
Finally got something working with SGML, but this is seriously the most ridiculous chunk of code ever to indent some HTML.
private static string FormatHtml(string input)
{
var sgml = new SgmlReader {DocType = "HTML", InputStream = new StringReader(input)};
using (var sw = new StringWriter())
using (var xw = new XmlTextWriter(sw) { Indentation = 2, Formatting = Formatting.Indented })
{
sgml.Read();
while (!sgml.EOF)
xw.WriteNode(sgml, true);
}
return sw.ToString();
}