views:

260

answers:

4

So here's what I understand (please correct if wrong) :

  • HTML5 is the newest version (or at least soon to be released) of HTML and contains features that XHTML does not yet have
  • XHTML served as MIME type text/html is equal to HTML for the purposes of rendering
  • Converting from text/html to application/xhtml+xml is difficult because it's not HTML
  • XML is not compatible with HTML

So my question is, what does XHTML have to do with HTML besides the usage of tags? What is the practical purpose of using XHTML over XML, or is there none?

+2  A: 

Yes, HTML5 will have features (nd even elements/tags) which XHTML does not yet have. It will also reintroduce some tags from HTML4 which were removed for XHTML. (iframe, is one of them, I think)

If you want to know about the differences between HTML4/XHTML and HTML5, read the wikipedia entry on HTML 5: http://en.wikipedia.org/wiki/HTML_5#Differences_from_HTML_4.01.2FXHTML_1.x

However, no, XHTML is not equal to HTML for the purposes of rendering. http://hixie.ch/advocacy/xhtml

XHTML is more XML than it is HTML. It essentially uses (most of) the elements ('tags') from HTML while respecting the stricter grammar and semantics of XML.

It is not equal to HTML4 for the purposes of rendering. If you use a mime type of XHTML - since it is XML - you can end up with nasty parse errors if you do it wrong:

A nasty XHTML parse error

... but despite this problem using XHTML can result in more consistent behavior between browsers than HTML4; since HTML4 is not as strict, browsers try really hard to interpret ambiguous markup, and the browser's developers end up with some freedom to choose how these ambiguities are treated. This leads to inconsistencies, but with XHTML - being XML - the browser is supposed to refuse to render the page at all should it be ambiguous.

XHTML can lead to problems with older browsers though, especially with empty elements (self-closing tags) such as <br />; an old browser won't know what to do with the / and might think its part of the tag name and then you lose your line break.

Lastly, I don't know what you mean by "Converting from text/html to application/xhtml+xml is difficult because it's not HTML" but I encourage you to check out the book "Refactoring HTML" by Elliotte Rusty Harold... it seems to be a 300+ page answer to the question ;)

LeguRi
Browsers don't have the freedom to make up their own rules about how to parse HTML (anymore)., now that HTML5 defines the correct parsing algorithm.
Ms2ger
Browsers never have the freedom to make up their own rules about how to parse HTML; only how to handle errors.
David Dorward
@ms2ger - Good point; I should have used the term "HTML4" instead of just "HTML" in that paragraph after the image. I'm editing it now :)
LeguRi
A: 

No, HTML5 will not have features that XHTML doesn't have, because it defines XHTML5 in the same document.

Ms2ger
+1  A: 

HTML5 is the newest version (or at least soon to be released)

Draft. Unstable. Subject to change. Not going to be released soon.

XHTML served as MIME type text/html is equal to HTML for the purposes of rendering

More or less. Writing HTML compatible XHTML is more work than writing HTML or XHTML, and doesn't let you use any of the interesting bits of XHTML.

Converting from text/html to application/xhtml+xml is difficult because it's not HTML

Converting from HTML to XHTML is actually pretty trivial. Tidy can do it for lots of documents (including all valid HTML documents).

XML is not compatible with HTML

It would be truer to say that HTML isn't XML.

So my question is, what does XHTML have to do with HTML besides the usage of tags? What is the practical purpose of using XHTML over XML, or is there none?

The advantage of using XHTML over XML is roughly the same as using a car over a pile of metal and plastic. XML is a toolkit for building markup languages.

The advantage of using XHTML over HTML is that you can mix different markup languages into one document, so (for example) you could have an XHTML+SVG+MathML document. This requires the client to support all the languages involved (or for the unsupported ones to gracefully degrade).

Unfortunately, this is impractical for most projects since Internet Explorer…

doesn't support XHTML

David Dorward
What does the file extension have to do with how the browser handles the file? In my experience how the web server handles the file is more important. I haven't tested it, but if you tell the web server to treat ".xhtml" like ".html" then there won't be a problem, right?
craigmoliver
@craigmoliver — Yes, it would be a problem, since you then lose all the benefits of XHTML (i.e. mixed namespace documents).
David Dorward
A: 

From a programmers perspective: HTML had weaker (little or no) structural constraints by design, or at least web-browsers did not require these to produce some form of output. Each browser had its own logic for trying to piece together the missing structure.

XHTML enforced stricter structural constraints ... and is a subset of XML -- it is just stricter than HTML. The stricter semantics allows:

  1. More consistent algorithms across browsers -- which means more consistency across browser implementations.
  2. Fewer clock-cycles per rendering -- which meant lower power devices could better handle rendering web-pages
  3. Stricter structural semantics are a requirement for the "semantic web" which means the structure can be leveraged to enable consistant extraction of usefull information from web-pages.

XHTML5 extends HTML to enable the web more interactive as far as I understand, and it is perhaps more significant than HTML to XHTML, as the stricter semantics should have been enforced from the get go. XHTML5 adds a lot of noticeable changes to the protocol.

As far as XML vs HTML for the web -- well a browser needs some rudimentary fixed document structure to make some sense of the document, perhaps this isn't as significant as it used to be: by this I mean that XSLT and CSS can skin a document into presentable output for a web-browser. However a document should have some inherent document-esque fixed structuring that is meaningful for a browser without modern web techniques. HTML4 allows for a document to be structured so rudimentary browsers like lynx can display the document -- after all lynx does not stand a chance to render CSS and style elements like images.

XHTML5 is in another league all together and probably excludes rudimentary browsers outright -- but its fixed tag vocabulary provide meaning for web-browsers, web-developers, and designers. web-browsers probably need certain tags to enable the new funky features -- but the vocabulary in itself is more elegant than just having to think of a web-page as a ad hoc XML document.

Hassan Syed
XHTML doesn't have "stricter structural semantics". It has the same semantics, a simpler structure, and stricter error handling rules. (1) No. It just came at a time when browsers were paying a bit more attention to specifications rather then writing tag soup scrapers without reading the SGML spec. (2) That's due to the simpler structure and the error handling requirements. It's moot these days, 'low power devices' are a lot more powerful then they were when XHTML first came out.
David Dorward
(3) XHTML adds nothing to semantics that wasn't already in HTML 4.01 (except for the possibility of mixing namespaces with things like RDF). (X)HTML5 is heading in the wrong direction as far as semantics are concerned (cf http://www.w3.org/TR/html5/the-xhtml-syntax.html#the-marquee-element-0 ).
David Dorward
@david I agree with (2)... however if the tag soup way of doing things had continued along with the evolution of the design techniques than the 'low power devices' would be in a far worse position now-a-days. Go ahead and edit my answer to add your comments if you like -- it seems your terminology is better than mine :D
Hassan Syed