tags:

views:

2372

answers:

8

Even with HTML5 being the path forward for HTML we get two options as developers: XHTML syntax and HTML syntax. I've been using XHTML as my main doctype for 5 or so years so I'm very comfortable with it.

But my question is given that non-xml syntax will be allowed, is there any reason to stick with a valid XML syntax? Do you gain anything going with one over another, besides preference (compatibility, etc)? Personally I'll feel a little dirty going back to not closing tags,
is second nature to me now, but would I gain something going back to HTML syntax?

Update: I guess my true question is is there a reason to switch from XHTML to HTML syntax? I've been using XHTML for years and not sure if there is a reason to switch back. Browser compatibility (IE was sometimes finiky with the application/xhtml+xml mime-type), etc?

A: 

The advantage of XHTML syntax is that it is XML. It can be easily parsed, understood and manipulated. The HTML syntax is a lot harder for clients to work with.

But ultimately, it is just a matter of syntax. Both forms are allowed for HTML5.

jalf
That's not true. XML is not easier to parse than HTML 4.01 Strict, provided that both are valid. The reason behind the self-closing tags in XML is that its a framework for defining markup languages, so one doesn't have to know before hand which are the self-closing tags. On the other hand, browsers already know what are these tags, so they know very well that after a <br> they should not expect a </br>. That's all.
Ionuț G. Stan
XPath or XSLT are two ready-made technologies for parsing and manipulating XML. They don't work with HTML. HTML allows more than just unclosed tags, it also allows you to close tags in a different order than they were opened. So no, that's not "all". :)
jalf
HTML 4.01 Strict, does not allow you to close tags in a different order. Just that some people did it does not mean it is allowed. The only thing hard about HTML is that it does not enforce draconian rules in the markup. XHTML is either correct or not. So, HTML, as defined in the standards is OK. What we have in the real world is not OK.
Ionuț G. Stan
Furthermore, because people think they what they write is XHTML, which in fact is invalid HTML, they believe XHTML is easy. But there are thousands, or ten of thousands of invalid XHTML/HTML pages out there with XHTML transitional doctypes. That because IE does not support XHTML so they had to send they markup as text/html. So, no XHTML/XML advantages.
Ionuț G. Stan
@jalf: I personally use XPath and XSLT with HTML. These technologies are independent of XML. They work on DOM, and both HTML and XML produce equivalent DOM. HTML 5 does not allow tags to be closed in wrong order (it's parse error. HTML 5 never breaks tree structure).
porneL
+15  A: 

The advantage of XHTML syntax is that it is XML. It can be easily parsed, understood and manipulated. The HTML syntax is a lot harder for clients to work with.

Nonsense! The HTML5 spec defines how to parse HTML in a way that is relatively easy to implement, and off-the-shelf parsers are being developed that can be easily integrated into tool chains. It's even possible for an HTML5 parser to be integrated into an XML tool chain in place of an XML parser.

But what you need to understand is that in practice, you're most likely using HTML anyway, even if you think you're using XHTML based on the DOCTYPE. If your content is being served as text/html, instead of application/xhtml+xml or another XML MIME type, then your content will be processed as HTML.

With HTML5, you can choose to use HTML-only syntax, meaning that it is only compatible with being served and processed as text/html it is not well-formed XML. Or use XHTML-only syntax, meaning that is is well-formed XML, but uses XML features that are not compatible with HTML. Or, you can write a Polyglot document, which is conforming and compatible with both HTML and XHTML processing (In principle, this is conceptually similar to writing XHTML 1.0 that conforms with Appendix C guidelines).

Lachlan Hunt
Indeed. This is what all browsers have been doing for their whole lives.
Mehrdad Afshari
Lachlan, it is not easy to implement and you know like me that the number of html 5 parsers are still very few compared to XML parsers.
karlcow
@Lachlan, you know very well that HTML 5 is still a draft and subject to change. As I understand it, none of the browsers available to the general public today implement the HTML5 parser spec in full, let alone other user agents. On the other hand, XML parsers are ubiquitous. Maybe one day, html5 parsers will be as convenient to use as xml ones, but not yet. Maybe one day, IE will implement application/xhtml+xml and web authors can, if they wish, leave text/html behind. In the meantime, if one wishes, as I do, to parse back ones own web pages, using a polyglot document is the way to go.
Alohci
karlcow, I said *relatively* easy to implement, and given that html5lib was implemented by a group of people with little to no experience implementing a parser beforehand simply by following the spec, I think my claim is valid.Alohci, yes, I am aware of the instability of HTML5 due to its WD status. But I was addressing the bogus claim that parsing HTML is a lot harder than parsing XML. It's not really relevant that browsers haven't yet finished migrating to fully conforming HTML5 parsers, as their existing parsers handle real world HTML sufficiently in practice anyway.
Lachlan Hunt
+1  A: 

When using XHTML you can mix it with other XML content, f.e. MathML, SVG or your own proprietary format, by just changing namespace at some point. Also, you can embed XHTML inside other XML documents.

(well, actually MathML and SVG can be used in non-XML HTML5 too, but they are special-cased)

liori
"When using XHTML you can mix it with other XML content, f.e. MathML, SVG or your own proprietary format, by just changing namespace at some point." <- except for IE.
Ionuț G. Stan
IE doesn't support HTML5 in the first place though, does it?
jalf
I always had the freedom not to call IE a web browser. HTML5 was designed for compatibility, so at least some parts of a web page will work.
liori
@jalf, it does. Well, depends what you mean by support. HTML5 is designed to be backwards compatible. It follows the principle of graceful degradation.
Ionuț G. Stan
+4  A: 

The HTML5 draft is very clear about which syntax to use:

  • use HTML syntax when sending pages as text/html
  • use XHTML syntax when sending pages as application/xhtml+xml

Reference: http://dev.w3.org/html5/spec/Overview.html#authors-using-xhtml

Ionuț G. Stan
true, but it doesn't really answer the question of which should be preferred when you have the option of using either content type.
jalf
it does, use HTML when is text/html and XHTML when is application/xhtml+xml. While you can use XHTML with text/html, that is not recommended, and the other way, HTML with application/xhtml+xml is not possible.
Ionuț G. Stan
Sorry but it doesn't really answer my question. I get that the mine-type is what tells the browser what syntax to use -- I was asking which to use myself. I can set the mime-type to be whatever I want, so I know *how* to switch between the two.
Parrots
@Parrots, but you know that IE does not support application/xhtml+xml, right? So I doubt you can use whatever mime type you want, except a few cases.
Ionuț G. Stan
A: 

Update: I guess my true question is is there a reason to switch from XHTML to HTML syntax? I've been using XHTML for years and not sure if there is a reason to switch back. Browser compatibility (IE was sometimes finiky with the application/xhtml+xml mime-type), etc?

You have to really consider two things. The language you are writing and the language you are sending. The Web is defined by 3 components:

  • URI
  • A resource - Markup Language (document)
  • A protocol - HTTP (tool for managing information space)

You can write a document with an XML syntax on your desktop such as using XHTML. In this specific environment, if you give the extension ".xhtml" to the filename and open it with your local browser, it will be parsed as XML. If you give the extension ".html" to the filename, it will be parsed as HTML. Basically in your authoring tool, it is XML, but this doesn't matter anymore once you process it with a tool.

On the Web, your ressource identified by a URI will be sent with a specific mimetype, most of the time, these days, people are using text/html. The mimetype defines how the client (browser, search engine bot, etc.) must process your document. If you are using an XML syntax but send it with text/html, the document will be processed by an html parser.

For sending your documents over the wire as XML, you have to configure your server to send it as application/xhtml+xml. (Note: that IE8 and previous versions do not understand what is application/xhtml+xml and they will propose the save menu.)

The HTML 5 Abstract model has been designed in a way that you can almost write it with an html syntax or an xml syntax in text/html. Almost because even if you write with an XML syntax (closing empty elements, quotes around attributes, etc.) you will get into troubles for complex pages which are calling scripting and namespaces, due to the way XML parsers and HTML parsers deal with those.

karlcow
+2  A: 

You shouldn't use XHTML to serve content on the Web (or any network including Internet Explorer clients); see Sending XHTML as text/html Considered Harmful for the full rationale.

Thomas Broyer
That's not true, there are cases where XHTML should be served on the web as application/xhtml+xml, when you specifically want/need to use some of the benefits of XHTML (see further down the article for examples). Usually, though, you will be better off serving HTML as text/html.
Alistair Knock
+6  A: 

I guess my true question is is there a reason to switch from XHTML to HTML syntax? I've been using XHTML for years and not sure if there is a reason to switch back. Browser compatibility (IE was sometimes finiky with the application/xhtml+xml mime-type), etc?

As mentioned in a previous answer, text/html is gets parsed as HTML and application/xhtml+xml gets parsed as XML. Thus, you should use the syntax that matches the MIME type you use.

If you are now serving text/html but using XHTML syntax, then you should fix your content to use the HTML5 syntax. You may already be close, since HTML5 allows the XMLesque /> empty element syntax for void elements (elements that are always empty, such as img and br).

If you are now using application/xhtml+xml, IE support would be a reason to switch to text/html and the HTML syntax if you care about supporting IE.

Trying to write polyglot documents that are correct HTML5 and XHTML5 (for serving different MIME types do different browsers with the same payload bytes) is harder than it seems at first sight and not worth the trouble.

hsivonen
A: 

Most of the benefits of XHTML have failed to materialise. While I wouldn't recommend it for new projects, XHTML served as text/html seems to be quite manageable and widespread, as long as you follow the compatibility guidelines. It probably isn't worthwhile changing any significant projects back to the HTML serialisation.

Casebash