views:

458

answers:

2

I have an HTML document stored in memory as an Linq-to-XML object tree. How can I serialize an XDocument as HTML, taking into account the idiosyncrasies of HTML?

For example, empty tags such as <br/> should be serialized as <br>, whereas an empty <div/> should be serialized as <div></div>.

HTML output is possible from an XSLT stylesheet, and XmlWriterSettings has an OutputMethod property which can be set to HTML - but the setter is internal, for use by XSLT or Visual Studio, and I can't seem to find a way to serialize arbitrary XML as HTML.

So, short of using XSLT solely for the HTML output capability (i.e. doing something like running the document through an otherwise pointless chain of XDocument->XmlReader->via XSLT, to HTML), is there a way to serialize a .NET XDocument to HTML?

+1  A: 

No. The XDocument->XmlReader->XSLT is the approach you need.

What you are looking for is a specialised serialiser that arbitarily adds meaning to tag names like br and div and renders each differently. One would also expect such a serialiser to work in both directions, IOW be able to read HTML Tag soup and generate an XDocument. Such a thing does not exist out-of-the-box.

The XmlReader to XSLT seems simple enough for the job, ultimately is just a chain of streams.

AnthonyWJones
The infuriating thing is that there obviously is support in the box *somewhere* for serializing using html rules - after all, it works from Xslt, and there is that `internal` property. Also, using XSLT's html output also adds a (generally useless and incorrect) META tag to the document's head. I'll leave the question open a while longer, but if no one can come up with a better solution, I fear you're correct.
Eamon Nerbonne
+1  A: 

Like you, I'm really surprised that the HTML output method isn't exposed, and I don't know of any way round it, other than the XSLT route you've already identified. When I faced the same problem a couple of years ago, I wrote an XmlWriter wrapper class, that forced calls to WriteEndElement to use WriteFullEndElement on the underlying XmlWriter if the tag being processed wasn't in the list {"area", "base", "basefont", "bgsound", "br", "col", "embed", "frame", "hr", "isindex", "image", "img", "input", "link", "meta", "param", "spacer", "wbr" }.

This fixed the <div/> problem and was sufficient for me as what I wanted to write was polyglot documents. I didn't find a method to make <br/> appear as <br> but apart from not being able to validate as HTML 4.01 this doesn't cause a real problem. I guess that if you really need this, and don't want to use the XSLT method, you'll have to write your own XmlWriter implementation.

Alohci