tags:

views:

617

answers:

4

I've got unformatted html in a string.

I am trying to format it nicely and output the formatted html back into a string. I've been trying to use The System.Web.UI.HtmlTextWriter to no avail:

System.IO.StringWriter wString = new System.IO.StringWriter();
System.Web.UI.HtmlTextWriter wHtml = new System.Web.UI.HtmlTextWriter(wString);

wHtml.Write(sMyUnformattedHtml);

string sMyFormattedHtml = wString.ToString();

All I get is the unformatted html, is it possible to achieve what I'm trying to do here?

A: 

There's nothing in the framework that will do what you want.

If the HTML fragment is valid XML you could load it into an XmlDocument and write some code to traverse it and output it formatted how you want.

Andrew Kennan
+2  A: 

You can pass it to tidy externally or use XmlTextWriter if you are willing to use XHTML instead of HTML.

eed3si9n
A: 

Use EFTidyNet, the managed .NET wrapper for Tidy. It is much simpler than using a batch file to call Tidy and also a lot faster.

Tidy can clean up your HTML and make it look nice, as well as turning it into valid HTML or XHTML.

amdfan
+1  A: 

Here's a function that does exactly that:

    // Attractively format the XML with consistant indentation.

    public static String PrettyPrint(String XML)
    {
        String Result = "";

        using (MemoryStream MS = new MemoryStream())
        {
            using (XmlTextWriter W = new XmlTextWriter(MS, Encoding.Unicode))
            {
                XmlDocument D = new XmlDocument();

                try
                {
                    // Load the XmlDocument with the XML.
                    D.LoadXml(XML);

                    W.Formatting = Formatting.Indented;

                    // Write the XML into a formatting XmlTextWriter
                    D.WriteContentTo(W);
                    W.Flush();
                    MS.Flush();

                    // Have to rewind the MemoryStream in order to read
                    // its contents.
                    MS.Position = 0;

                    // Read MemoryStream contents into a StreamReader.
                    StreamReader SR = new StreamReader(MS);

                    // Extract the text from the StreamReader.
                    String FormattedXML = SR.ReadToEnd();

                    Result = FormattedXML;
                }
                catch (XmlException ex)
                {
                    Result= ex.ToString();
                }

                W.Close();
            }
            MS.Close();
        }
        Debug.WriteLine(Result);
        return Result;
    }