views:

91

answers:

1

Hello,

I'm trying to find a way of indenting a HTML file, I've been using XMLDocument and just using a XmlTextWriter.

However I am unable to format it correctly for HTML documents because it checks the doctype and tries to download it.

Is there a "dumb" indenting mechanism that doesnt validate or check the document and does a best effort indentation? The files are 4-10Mb in size and they are autogenerated, we have to handle it internal - its fine, the user can wait, I just want to avoid forking to a new process etc.

Here's my code for reference

        using (MemoryStream ms = new MemoryStream())
        using (XmlTextWriter xtw = new XmlTextWriter(ms, Encoding.Unicode))
        {
            XmlDocument doc = new XmlDocument();
            // LoadSettings the unformatted XML text string into an instance
            // of the XML Document Object Model (DOM)
            doc.LoadXml(content);

            // Set the formatting property of the XML Text Writer to indented
            // the text writer is where the indenting will be performed
            xtw.Formatting = Formatting.Indented;

            // write dom xml to the xmltextwriter
            doc.WriteContentTo(xtw);

            // Flush the contents of the text writer
            // to the memory stream, which is simply a memory file
            xtw.Flush();

            // set to start of the memory stream (file)
            ms.Seek(0, SeekOrigin.Begin);

            // create a reader to read the contents of
            // the memory stream (file)
            using (StreamReader sr = new StreamReader(ms))
                return sr.ReadToEnd();
        }

Essentially, right now I use a MemoryStream, XmlTextWriter and XmlDocument, once indented I read it back from the MemoryStream and return it as a string. Failures happen for XHTML documents and some HTML 4 documents because its trying to grab the dtds. I tried setting XmlResolver as null but to no avail :(

A: 

Without access to the specific X[H]TML causing the problems, it's hard to know if this will work, but have you tried using XDocument instead?

XDocument xdoc = XDocument.Parse(xml);
string formatted = xdoc.ToString();
Aaronaught