DOMDocument->saveHTML()
takes your XML DOM infoset and writes it out as old-school HTML, not XML. You should not use saveHTML()
together with an XHTML doctype, as its output won't be well-formed XML.
If you use saveXML()
instead, you'll get proper XHTML. It's fine to serve this XML output to standards-compliant browsers if you give it a Content-Type: application/xhtml+xml
header. But unfortunately IE6-8 won't be able to read that, as they can still only handle old-school HTML, under the text/html
media type.
The usual compromise solution is to serve text/html
and use ‘HTML-compatible XHTML’ as outlined in Appendix C of the XHTML 1.0 spec. But sadly there is no PHP DOMDocument->saveXHTML()
method to generate the correct output for this.
There are some things you can do to persuade saveXML()
to produce HTML-compatible output for some common cases. The main one is that you have to ensure that only elements defined by HTML4 as having an EMPTY
content model (<img>
, <br>
etc) actually do have empty content, causing the self-closing syntax (<img/>
) to be used. Other elements must not use the self-closing syntax, so if they're empty you should put a space in their text content to stop them being so:
<script src="x.js"/> <-- no good, confuses HTML parser and breaks page
<script src="x.js"> </script> <-- fine
The other one to look out for is handling of the inline <script>
and <style>
elements, which are normal elements in XHTML but special CDATA
-content elements in HTML. Some /*<![CDATA[*/.../*]]>*/
wrapping is required to make any <
or &
characters inside them behave mostly-consistently, though note you still have to avoid the ]]>
and </
sequences.
If you want to really do it properly you would have to write your own HTML-compatible-XHTML serialiser. Long-term that would probably be a better option. But for small simple cases, hacking your input so that it doesn't contain anything that would come out the other end of an XML serialiser as incompatible with HTML is probably the quick solution.
That or just suck it up and live with old-school non-XML HTML, obviously.