views:

431

answers:

4

By my understanding, neither XHTML (1.0, 1.1) nor XHTML 5 required a DTD. If this is true, how will a browser differentiate between the two?

I can only assume that when the browser vendors support (X)HTML 5, all XHTML will be interpreted under XHTML 5 rules (assuming XHTML 5 is a superset of XHTML 1.0). Is this how it's going to work?

+3  A: 

There is no XHTML 5. Currently there is HTML 4.01 and XHTML 1.0. There will be no XHTML 2.0. There will only be HTML 5. HTML 5 is not an XML standard (meaning an HTML 5 document is not an XML document).

Perhaps you're looking at HTML 5 + XML = XHTML 5. I guess you can express HTML 5 as XML but as far as I know this is non-standard. More specifically, this is just a serialization method for the document tree rather than a standard.

To clarify this issue, take a look at HTML 5 and XHTML 5 - one vocabulary, two serializations. Even from the title it says "one vocabulary, two serializations". And Conversation With X/HTML 5 Team:

The XHTML 5 spec says that "generally speaking, authors are discouraged from trying to use XML on the Web". Why write an XML spec like XHTML 5 and then discourage authors from using it? Why not just drop support for XML (XHTML 5)?

Some people are going to use XML with HTML 5 whatever we do. It's a simple thing to do — XML is a metalanguage for describing tree structures, HTML 5 is a tree structure, it's obvious that XML can be used to describe HTML 5. The problem is that if we don't specify it, then everyone who thinks it is obvious and goes ahead and does it will do it in a slightly different way, and we'll have an interoperability nightmare. So instead we bite the bullet and define how it must work if people do it.

XHTML 1.0 was a standard. It differed to HTML 4. XHTML 5, if you can call it that, is nothing more than representing HTML 5 documents in XML form.

cletus
You're wrong. XHTML5 is part of the HTML 5 specification. See http://www.w3.org/TR/html5/introduction.html#html-vs-xhtml
Pavel Minaev
That's not quite correct. There is an XML serialization method for HTML5 and despite it having XML-specific features (like namespaces), it *is* the HTML5 standard. The standard goes so far as to say that XML usage is strongly discouraged and really only included it because some people will do it anyway.
cletus
I see what you mean. w3 only mentions HTML 5 + XML as a serialisation method (http://www.w3.org/TR/2008/WD-html5-diff-20080122/#syntax ), so HTML 5 and "XHTML 5" or HTML 5 + XML are the same languages, just with two serialisation methods. I think my question is still relevant though.
Abignale
Minor offtopic question: Aside from legacy support, why would HTML 5 in XML be so strongly discouraged? I've always preferred the strictness of the syntax and the feeling that an XML parser is going to get through it faster than tag soup.
Abignale
@Abignale. See the oft quoted http://hixie.ch/advocacy/xhtml for the reasons behind discouraging XHTML. Put simply, getting XHTML right is so difficult that of those who attempt it, most get it wrong. And there's little to be gained from doing so. Unless you have a specific reason, e.g. wanting to parse the generated pages, it's really not worth the effort. Contrary to widely held belief, there is no evidence that XML parsers are faster than HTML ones. However, if you need a parser, XML parsers are much more widely available, stable and well tested than equivalent HTML ones.
Alohci
@Cletus. Apart from the "authors are discouraged..." statement that you quote which, frankly is little more than Hixie's personal opinion, and by no means a consensus of the HTML WG, the HTML5 and XHTML5 serialisations have equal status within the HTML 5 draft. HTML5 is a representation of the HTML 5 infoset in HTML form; XHTML5 is a representation of the HTML 5 infoset in XHTML form.
Alohci
@All. Unfortunately the HTML 5 nomenclature is really confusing. HTML<space>5 - is the language and processing spec. HTML5 - i.e. without the space - is the HTML serialization. XHTML5 - i.e. without the space - is the XHTML serialization. There is no XHTML<space>5.
Alohci
+1  A: 
David Dorward
Not sure what you mean by that. Browsers don't care about the meaning of elements, just how to process them. The HTML 5 draft retains processing backward compatibility for browsers, the changed semantics only affect tools that attempt to extract meaning from the markup. However, AFAIK the semantics of elements in HTML 5 are more narrow than that of HTML 4, so the ultimate outcome for any semantic extraction tool will be that the HTML 4 rules still apply, since they will be unable to tell when the more narrow meaning applies.
Alohci
A: 

It would be possible to distinguish by DOCTYPE (which is different for HTML5 and XHTML 1.x), but its presence is specifically non-mandatory in XHTML5; and element namespace is the same. So, in general, there's no good way to distinguish them. If you want to write portable XHTML5, I guess providing DOCTYPE is your best bet.

Pavel Minaev
The Doctype is optional for XHTML family documents (it has to be so you can mix namespaces sanely).
David Dorward
I specifically wrote that DOCTYPE is optional (non-mandatory) in my answer. My point however is that if you _do_ write it (and you have an option as _content author_), chances are good that browser _will_ treat it as an indicator of XHTML5.
Pavel Minaev
Browsers haven't paid any attention to Doctypes in the past (except as an intelligence test to determine if they should use Quirks or Standards mode), I don't see that changing.
David Dorward
The whole point of HTML5 is make browser vendors agree on how things should be done. It's why it's written by vendors in the first place. If something is in the spec, chances are good that it will be used.
Pavel Minaev
I'm not sure I agree with that last statement, but its moot anyway - there isn't a Doctype for XHTML5 in the spec, in fact, the spec explicitly states there isn't a standard one.
David Dorward
+2  A: 

From http://hsivonen.iki.fi/xhtml2-html5-q-and-a/ :

If I can use any doctype for XHTML5, how can browsers tell XHTML 1.0 and XHTML5 apart?

They can’t and they don’t need to. By design, a user agent that implements XHTML5 will process inputs authored as XHTML 1.0 appropriately.

Craig