tags:

views:

1964

answers:

18

I wonder why I should use XHTML instead of HTML.

XHTML is supposed to be "modularized", but I haven't seen any server side language take advantage of any of that.

XHTML is also more strict, and I don't see the advantage. What does XHTML offer that I need so bad? How does it make my code "better"?

EDIT: another question I found in the comments: Does XHTML parse faster than HTML?

EDIT2: after reading all your comments and the links, I indeed agree that another post deserves to be the correct answer, so I chose the one that directly links to the best source.

Also, goes to show that people upvote the green comment without even reading it.

+14  A: 

For the visitor of a website it probably doesn't make any visible difference. Furthermore, XHTML is usually more of a pain to use as at least one widespread browser still doesn't know how to handle it and you need to serve it as text/html in that case (which yields invalid HTML).

If your HTML is going to be regularly processed by automated tools instead of being read by humans, then you might want to use XHTML because of its more strict structure and being XML it's more easy to parse (from an application standpoint. Not that XML is inherently easy to parse, though).

Apart from that I don't see any compelling reasons to use it, though. XHTML was created in an approach of making use of XML features for HTML and basically it boils down to "HTML 4 with several annoying side-effects" (IMHO, at least).

Joey
+1. I completely agree. XTHML is another example of development by committees living in Ivory towers (CSS is the other example I have in mind).
AnthonyWJones
Wholeheartedly agree about XHTML but the development of CSS has been fine until their relatively recent decision to just let the browser vendors work out what the standard is supposed to be. The problems with CSS look 100% the fault of the vendors to me.
annakata
I fundamentally disagree with the argument presented here. I was going to post a comment, but it grew too large, and became an answer (below). The XHTML spec. was not simply developed to make parsing easier (though it does do that), but to support a variety of technologies comprising a more complete structure than HTML was able to allow. Developed by those in ivory towers, maybe... but the specification makes a lot more sense when you take a web-wide view of things.
James Burgess
@James, well said.
Chuck Conway
I guess all the upvotes to that posting show two things: 1. many people really haven’t understood XHTML (see James’ answer), and 2. there’s a lot of frustration about this due to its perceived usefulness. Who’s to blame? I guess the “publicity” for XHTML was just really, really bad.
Konrad Rudolph
Oops, I meant “uselessness”, not “usefulness.”
Konrad Rudolph
@Konrad: XHTML is useful, no doubt, but those having upvoted might be: 1) Frustrated, as you said, by fact XHTML must (currently) be served as HTML! 2) Just outputting simple information to human users, not targeting external parsers, and with no need of modularity or other advanced features (namespace...).
PhiLho
+4  A: 

Take a look at http://www.w3.org/MarkUp/2004/xhtml-faq#need There are some good reasons apart from modularisation.

I favor XHTML because it's stricter and more clearly laid out. HTML is quirky and browsers have to accept things like <b><i>sadasd</b></i>, while this is a really simple example it could also geht more confusing and different browsers could lay out things differently. Also I think that XHTML has to be "faster" since the browser doesn't have to do that kind of "reparations".

lx
Browsers don't *have* to accept improperly nested tags (your example is invalid HTML), but they do - since authors use them, and throwing errors at users instead of rendering the page is unhelpful. Even served as application/xhtml+xml a lot of browsers will switch to text/html mode if they hit a well-formedness error.
David Dorward
the <b> and <i> tags are also deprecated in html as they're presentational. <strong> and <em> are the semantic equivalents.
Bayard Randel
@Bayard: <b> and <i> are not deprecated despite being presentational. http://www.w3.org/TR/REC-html40/present/graphics.html#edef-B (this hasn't changed in XHTML nor HTML5). Of course they shouldn't be used if other elements are more appropriate, but HTML doesn't have element for everything, and using <em> for anything that's italic is just as wrong.
porneL
+1  A: 

XHTML forces you to be neat.

For example, in HTML, you can write:

<img src="image.jpg">

This isn't very logical, because the img tag never gets closed. In XHTML, however, you're forced to close the tag neatly, like this:

<img src="image.jpg" />

I like using something that forces me to be neat.

Steve

Steve Harrison
I think it does make sense that the img tag never gets closed. HTML!=XML and since the img tag has no contents, why should you close it.
Pim Jager
It is perfectly logical - so long as you don't treat it in isolation. There is no reason for an img element to have content, so the DTD says that it can't have content. Since it can't have content, you can imply (with 100% reliability) that anything that appears after the start tag is outside the element. The result is that the end tag is forbidden. This results is smaller markup. It is less intuitive, but easier to write once you have learned the rules.
David Dorward
1) the space before ending slash is outdated, I doubt there are still many Netscape browsers needing it.2) You can be neat and tidy with HTML, closing all tags accepting a closing part. But indeed enforcing the rules might be easier for the developer.
PhiLho
@Pim Jager and @David Dorward: I see what you're saying with an image element having no content, but I think XHTML's way of doing it is more logical and consistent.
Steve Harrison
@PhiLho: 1) Thanks, I'll look into that. 2) I agree.
Steve Harrison
There is a logical content to put inside the <img> tag, the alt text. I think it was really a mistake to put text content for any element inside it's attribute, including <input>. Self-closing tags are a way to be sure that you're done with the tag regardless of which one it is. That way you don't have to remember which tags you have to close. <img src="img.png">Add Comment</img> Makes sense, doesn't it?
Jethro Larson
@Jethro Larson: I definitely agree with the <input> tag, but I'm not so sure about the <img> tag. In my opinion, the <img> tag is meant for displaying an image, not for providing a piece of text that describes an image. I always include an "alt" attribute if applicable—but since it's really an 'extra', I think it's better that it is an attribute of the <img> tag, not its content. At least, that's my opinion...
Steve Harrison
A: 

In my opinion, the strictness is, at least in theory, a good thing, because in HTML, you don't need to be strict, and because of that and the HTML5 junk, Browsers have advanced error correction algorithms that will make the best out of broken HTML. The problem is, the algorithms are not exactly the same and will lead to really strange behaviour you can't predict. With XHTML, on the other hand, you typically have fine, valid XHTML and so the error correction algorithms are not needed, i.e. the entire Browser behaviour is predictable. In addition, strict code makes it easier for your tools to work with the code. So you have actually nothing to lose by using XHTML, but there is some potential to gain. Things will get worse with plain HTML when HTML5 is finally out and the "be open in what you accept" will lead to the described strange behaviour. But at least then it's a standardized strange behaviour. Sigh.

On the other hand, if you use a good IDE like Visual Studio, it's almost impossible to produce broken HTML code anyway, so the result is the same.

OregonGhost
Browsers have error correction because people write bad HTML, not because HTML is "less strict". XHTML is no different - most browsers which support it will throw an error on non-well-formed data and then parse it with the HTML parser. (And they will try to error correct for well-formed by invalid XHTML).
David Dorward
Actually, that is just wrong. Browsers will through an error on "non-well-formed data", and then just stop parsing. As per the spec. They don't continue to try and parse it. (Go on, try it. Get a random HTML document. Change the doctype (or add one) to an XHTML one. Open in Firefox. Watch how it Firefox doesn't try and recover from the first error it hits. (If it displays the page, it means that the HTML is also valid XHTML, which will not be the case normally (any BR, IMG, HR or other self closing tags have different forms in XHTML and HTML).
Alya
Firefox throws a yellow screen of death. Most XHTML capable browsers don't. Opera, for example, prompts the user to treat the page as text/html instead: http://realtech.burningbird.net/image-galleries/screenshots/opera-xhtml-error (and, if I remember rightly, WebKit just switches to text/html without prompting)
David Dorward
@David: You are right that technically, the reason browsers have error correction now is the bad HTML thing. But that doesn't matter, because we are where we are. Plain HTML has error correction in all browsers, no matter what is defined in the standard, and we will never get it to go away. Therefore, practically, HTML is less strict than XHTML in most browsers.
OregonGhost
A: 

As a programmer, you should be VERY concerned about your code. HTML is ugly and follows few rules.

XHTML on the other hand, turns HTML into a proper language, following strict structural and syntactic rules.

XHTML is better for everyone, as it will help move the web to a point where everyone (all browsers) can agree on how to display a web page.

XHTML is an XML descendent, and us such is much easier on parsers built for the job of analysing syntactically sound XML documents.

If you can't see the benefit of XHTML, you might as well be using MS Word to create your HTML documents.

Antony Carthy
HTML has pretty strict syntactic and semantic rules. They're just not the same as for XML. You may want to read the spec :)
Joey
HTML4 has. HTML5 has rules (or at least, there are many people who want it to have) how to correct mistakes, i.e. no longer has strict syntactic and semantic rules. And the problem with HTML4 is that Browsers already correct mistakes, rendering the "strict syntax" a joke.
OregonGhost
"If you can't see the benefit of XHTML, you might as well be using MS Word to create your HTML documents." .... Really?
Paolo Bergantino
I know the spec of HTML 4.1 but I don't feel that that's what WebDevHobo was asking. Most people don't realise that HTML 4.1 is a spec, so they code how they want including, as above <b><i>wrong</b></i>. Also, I never said it doesn't have rules, just that it has fewer. Please revise the -1, as this argument is more about using standards than the fact that there is a standard for html (tell that to IE6).
Antony Carthy
By the way, I think you've made a typo at the beginning of your fourth paragraph: "XHTM" rather than "XHTML"...
Steve Harrison
Thanks Steve, fixed. So you guys minus, me, a new user trying to get to 50 or 100 rep so I can comment, but the guy above is using uppercase tags in XHTML and you leave him be?!
Antony Carthy
The rules for what elements are allowed where are the same in HTML and XHTML (although the HTML DTD expresses more of them so it is easier to check for deviation in HTML documents). The HTML DTD also clearly shows which start and end tags are optional or forbidden. HTML has _more_ rules than XHTML - it has to to describe all the exceptions. The end result is just a different way to express the same DOM.
David Dorward
You say HTML is ugly... I say XML/XHTML is just as ugly *and* it's more verbose, so it propagates more of that ugliness. More substantially, I'm still not seeing what's going to prevent browser vendors from being "friendly" and doing their (incompatible) best to handle noncompliant XHTML, just like they do now with HTML, and giving us MSIE6-for-XHTML all over again.
Dave Sherohman
It's true that HTML4 follows few rules - i.e. specification did not define parsing exactly. HTML5 fixed that (every document can be parsed unambiguously).
porneL
+3  A: 

Some differences are:

  • XHTML tags must be properly nested
  • The documents must have one root element
  • XHTML tags are always in lowercase
  • Tags must always be closed (e.g. using the <br> tag in XHTML must have closing tag <br /> or <br></br> in XHTML)

Here are some links on it

wiki XHTML

wiki HTML vs XHTML

kevchadders
"# XHTML tags are always in lowercase# Tags must always be closed (e.g. using the <BR> tag in XHTML must have closing tag <BR /> or <BR></BR> in XHTML)"Why is your <br /> tag in capitals?
Antony Carthy
doh... its been a busy week! ;) Good spot, i've edited it.
kevchadders
HTML elements by also be properly nested. Sometimes start or end tags are optional or forbidden though. HTML documents must have one root element.
David Dorward
The question wasn't about the differences, I suppose the poster knows them...
PhiLho
+1  A: 

The subtitle to the XHTML 1.0 recommendation:

A Reformulation of HTML 4 in XML 1.0

Many tools exist today to process XML. By using XHTML, you are allowing a huge set of tools to operate on your pages and to extract information programmatically.

If you were to use HTML, this would be possible too. There are tools in existence to parse HTML DOM trees. However, these tools can often be more specialized than those for XML. You may not find your favorite XML data processing tools compatible with HTML. Furthermore, there are so many uses for XML nowadays that you may be using XML for some other part of an application; why not also use that same XML parser to parse your web pages? This is the motivation behind XHTML.

If you're already comfortable and familiar with HTML 4.01, you have an established project using HTML 4, and you don't have tons of spare time, just go with HTML 4.01. If you have spare time, learn XHTML 1.1 anyway, and start your new projects in XHTML 1.1 – there's no harm in doing so. If you're using something other than HTML 4.01 or are pretty unfamiliar with HTML 4 anyway, just learn XHTML 1.1.

Wesley
The harm in using XHTML is either lack of IE compatibility or limiting yourself to common subset of XHTML and HTML. For example you can't safely generate "Appendix C" XHTML with XML serializer (e.g. <script/> will confuse non-XML parsers).
porneL
You're right. I wasn't thinking of generating XHTML when I wrote my answer. And yes, I was thinking of the common subset of HTML and XHTML to avoid IE parsing problems.
Wesley
+1  A: 

Using XHTML with the correct DocType will force the browser to render the content in a more standards compliant (strict) mode. This makes the different browsers behave better and, most importantly, more like each other. This makes your job as a webdeveloper a lot easier since it reduces the amount of browser specific tweaks needed to make the content look the same in all browsers.

Quirksmode.org has a lot of good info on this subject.

Marnix van Valen
Using HTML with the correct Doctype will also put browsers into Standards mode. Heck, using nonsenseML with a crazy made-up Doctype will do that too.
David Dorward
+28  A: 

I was going to add this as a comment to one of the other posts, but it grew a little too large.

What the fundamental point that most people seem to be missing, is the purpose behind XHTML. One of the major reasons for developing the XHTML specification was to de-emphasise presentation-related tags in the markup, and to defer presentation to CSS. Whilst this separation can be achieved with plain HTML, this behaviour isn't promoted by the specifcation.

Separating meta-markup and presentation is a vital part of developing for the 'programmable web', and will not only improve SEO, and access for screen readers/text browsers, but will also lead towards your website being more easily analysable by those wishing to access it programmatically (in many simple cases, this can negate the need for developing a specific API, or even just allow for client-side scripts to do things like, identify phone numbers readily). If your web-page conforms to the XHTML specification, it can easily be traversed using XML-related tools, and things such as XPath... which is fantastic news for those who want to extract particular information from your website.

XHTML was not developed for use by itself, but by use with a variety of other technologies. It relies heavily on the use of CSS for presentation, and places a foundation for things like Microformats (whether you love them, or hate them) to offer a standardised markup for common data presentation.

Don't be fooled by the crowd who think that XHTML is insignificant, and is just overly restrictive and pointless... it was created with a purpose that 95% of the world seems to ignore/not know about.

By all means use HTML, but use it for what it's good for, and take the same approach when looking at XHTML.


With regard to parsing speed, I imagine there would be very little difference in the parsing of the actual documents between XHTML and HTML. The trade-off will come purely in how you describe the document using the available markup. XHTML tags tend to be longer, due to required attributes, proper closing, etc. but will forego the need for any presentational markup in the document itself. With that being the case, I think you're talking about comparing one type of apple, with a very slightly different type of apple... they're different, but it's unlikely to be of any consequence (in terms of parsing and rendering) when all you want is a healthy, tasty apple.

James Burgess
Whilst XHTML goals are laudable the browser vendors aren't playing ball well with either XHTML or CSS. Proper XHTML currently makes Web development harder not easier. When the vendors of tools and browsers change this situation then perhaps we will be able to start to see some benefits but those who stand to gain the most are not the site developers but the SE vendors. Imagine being able to present rich meaningful info harvested from your sites to a degree where the site itself needs no visit? What happens to your advertising revenue? Who do you think lives in the ivory towers anyway?
AnthonyWJones
+1. Good counter arguement
AnthonyWJones
Good counter-counter argument; and I see your point, but the isolation point is not purely XHTML-related, as you could argue that RSS feeds have done that. However, people still visit websites, as many people don't like reading via feeds. The point is to allow easier access to data, and to mark it up semantically, rather than to hide it in a jumble of presentation-related, poorly-formed tags. Making your website easily accessible and 'meaningful' for processing is only a benefit for you. In regards to browsers rendering markup. I can agree, but show me consistency in HTML rendering, first.
James Burgess
+1 - thanks for addressing my comment!
Dominic Rodger
The summary of the summary then is that the interoperability benefits of XHTML are threatened by the inadequacies of browser vendors.
annakata
(and I slightly disagree with you on performance - in theory XHTML doesn't have to allow for and make the kind of corrections that poorly formatted HTML demands. With well written HTML I imagine the differences are near zero, but XHTML will always get to take shortcuts)
annakata
Agreed. It would be nigh-on impossible to get accurate benchmarks for the two, as you can't actually write 100% equivalent markup, and exhibit a difference (by the very nature of the discussion). Most of the difference is going to come down to the length of document to be parsed... and also bear in mind that poorly-formatted HTML is usually invalid, anyway.
James Burgess
Separating presentation from markup was a goal that predates XHTML by some time using the strict doctype. And HTML 4 _did_ phase out presentational markup in favor of structural markup. Just because nearly everyone uses HTML 4 Transitional doesn't mean it's a good idea or even recommended by the spec. Apart from that you don't get true separation as you nearly always have to adapt your markup to your style. CSS Zen Garden is nice but for some CSS effects you need several layers of <div>s. And that's not really a good separation of style and content imho. HTML and XHTML are very much alike here
Joey
The concept pre-dates XHTML, but XHTML makes it a priority, and the specification attempts to put some form of strict control on it. HTML4's 'support' of separation of presentation and markup is sloppy at best. <div>s are not presentational tags, they are for the division of information in a logical fashion. Just because they're not used properly, doesn't mean that the XHTML specification is invalidated in any way. XHTML encourages far better practices than HTML does, and is far more strict on poor syntax. It's a win-win situation for those wanting to work towards a standards-based web.
James Burgess
Your response is *very* inaccurate. HTML4 supports separation exactly the same way as XHTML1.0: http://www.w3.org/TR/REC-html40/present/styles.html#h-14.1simply because XHTML1.0 specification *does not include definition of a single element or attribute*. It refers to HTML4 in all aspects related to semantics and presentation.
porneL
In XHTML1 spec word presentation occurs only once, in definition of word Rendering. XHTML/1.0 allows <font>, <center> and <strike> elements. Your argument is based on self-perpetuating advocacy, not the XHTML specification.
porneL
You make the incorrect assumption that the purpose of a specification is the specification itself. The semantic implementation of XHTML is defined by the specification, that is true, and it also allows for traditional presentation-based markup. However, the purpose and intent behind XHTML is far different and is not embodied fully in the specification itself. It would be sensible to verify the intent and meaning of someone's answer, before claiming that they base their answers on "self-perpetuating advocacy," when you, yourself, have missed the point of the answer entirely.
James Burgess
At no point do I state that the specification restricts or discourages presentation-based markup, however, I do state that "one of the major reasons for **developing** the XHTML specification was to de-emphasise presentation-related tags in the markup." This is a vastly different concept. I would appreciate if you would revert your down-vote, in light of the fact that you have failed to read the post carefully enough to give a valid response and have merely assumed that I am unable, or unwilling, to read the specification for myself. I have read it... but I have also read around it.
James Burgess
porneL
That's hardly proof that XHTML itself failed miserably. The page's XHTML is generated by Wordpress, which is renowned for poor XHTML syntax.
James Burgess
XHTML implementations are so poor, because uptake of XHTML is so poor. The web-browser industry largely caters for demand. If developers make sloppy HTML, browsers will account for it. If developers wrote valid XHTML, maybe browsers would develop better support for it. Don't blame the standard for poor uptake, and don't blame the standard for poor implementation of the standard. I do't think the XHTML standard is great, I don't think it's perfect, but I do think it's a step in the right direction. It'll be of no relevance, though, if no-one bothers to use it.
James Burgess
The XHTML1.0 spec contains information why it was developed and what its goals are. Your opinion is clearly based around something that's external to XHTML, and purely ideological. Technically it has no ground.
porneL
Once again, if you read my comments (or my answer) for what's there - not what you thought was there - you'd understand that the consideration of XHTML was not purely taken from the specification. You can't define the entire use, purpose, and history of a technology standard by the standard itself. It's impractical to do so. I'd recommend going and reading up on some of the issues surrounding the creation of XHTML, or maybe even the viewpoints of those who formed the XHTML specification, regarding presentation/markup, and the HTML 'tag soup'.
James Burgess
Can you point me to authoritative source of your claims?
porneL
That depends... what do you class as an authority? There is a plethora of information on what people like Tim Berners-Lee thought/think of the XHTML spec that's only a Google search away (I turned up an article on his view on the poor uptake of XHTML1.0, and another on the formation of the standards working group for the XHTML standard). There is no one 'authoritative' source. The spec is the source of so-called 'authority', but it doesn't give the background, the purpose, or even the history of ideas that led to the formation of that purpose. You're truly asking the wrong question here.
James Burgess
The OP asked for reasons to use XHTML over HTML (or vice versa). I gave him reasons that are perfectly valid - because, on specification level alone, you'd pick HTML due to it's simplicity of development, and tolerance. But that's simply not the point, and that's evidently not the best solution for a plethora of possible outcomes. For example: I hate parsing data programmatically from HTML pages, but valid XHTML pages are far easier. Where does it state that benefit in the spec? Nowhere. But does that make it any less of a real benefit, or purpose? No.
James Burgess
"I hate parsing data programmatically from HTML pages, but valid XHTML pages are far easier. Where does it state that benefit in the spec?". In the second point of "Why the need for XHTML?" (and there are only two).
porneL
Oh, and BTW: I hate parsing XHTML programmatically, because it requires validating parser that has access to DTD catalog. HTML parsers are self-contained and less fussy.
porneL
The second point of that section talks about user-agent interoperability, not programmatic data extraction. Parsing an entire document for rendering is a very different task to parsing data for extraction/analysis. Either way, I've really had enough of arguing the point, as you seem to have the view that the specification is everything. You can take that view, by all means, but you'll miss the essence of so many technological developments. The ideas proposed by XHTML truly are a good thing (the implementation, maybe not so much)... but cynics always find a way to quash potential. *shrug*
James Burgess
I agree with you about ease of parsing and data extraction (this is XML's goal and XHTML is about moving to XML).However concerns about structure and presentation predate XHTML. http://www.w3.org/TR/REC-html40/present/styles.html#h-14.1 And it was HTML4 that had presentational de-emphasis as a goal: http://www.w3.org/TR/html401/intro/intro.html#h-2.3.5You are attributing these things to XHTML, which is clearly wrong.
porneL
http://www.w3.org/TR/REC-html40/sgml/dtd.html This is HTML 4.01 Strict DTD, which excludes the presentation attributes and elements that W3C expects to phase out as support for style sheets matures. Authors should use the Strict DTD when possible, but may use the Transitional DTD when support for presentation attribute and elements is required.
porneL
They do predate XHTML... and aspects are present in HTML4. I didn't isolate any concept as being purely applying to XHTML - but that the culmination of many concepts has led to a standard that pushes the boundaries in a helpful direction. Overlap is inevitable when the two standards are developed by the same working group, with a similar vision. Please... stop shifting the argument. I tire of the XHTML vs. HTML debate in general, as it always moves towards "well, this pre-dates XHTML" (and the like). XHTML brings together a swathe of good ideas and concepts... that doesn't make it poor.
James Burgess
I'm referring to "One of the major reasons for developing the XHTML specification was to de-emphasise presentation-related tags in the markup, and to defer presentation to CSS." which is provably false, because presentation was already deferred to CSS in HTML4 (I've shown you the links) and you haven't backed up your claims with anything but rhetoric and vague references.
porneL
Seriously... I've answered every question you've asked, and all you've done is shift the goal posts every time. What is the point? My argument is not invalid, you just disagree with it. It's not invalidated by the fact that HTML4 incorporates similar aspects, as the standards were developed by the same people. Perhaps I should have phrased the sentence you quoted better, but the sentiment still stands: the specification was borne out of a number of frustrations with HTML3, one of which was the hodge-podge of information+presentation. This idea was also carried through to the HTML4 spec.
James Burgess
The intent of XHTML WG is red herring. What XHTML allows and what can do is what matters, and XHTML1.0 fails to convey intent of de-emphasis of presentational markup (situation is different in XHTML2. if you were talking about XHTML2, please clarify that in your answer).
porneL
I was non-specific in my answer because I am giving only a supporting argument to the XHTML standard. There is no right or wrong answer to this question, as there's nothing that mandates use of one standard or the other. I gave an argument *for* XHTML, just as others have given arguments *against* HTML. Nothing in my argument is invalid or inherently disprovable, but several things are not part of the specification, but part of the WG brief, or similar. If you're going by what a standard allows, rather than what is encouraged, then this argument is totally moot...
James Burgess
Could you add examples to your answer? e.g. non-presentational construct that XHTML enables? or presentational HTML markup encouraged in HTML4 that is no longer encouraged in XHTML?Is XHTML needed to achieve what you say that XHTML encourages?I think stackoverflow comments are not appropriate for such lenghty discussion. Would you discuss this issue with me on another forum?
porneL
I didn't say it "enables", I inferred that it promotes the separation. I'll happily discuss this with you elsewhere, provided the discussion is not going to continue along the lines of "the spec doesn't say that", "it's part of the history/purpose", "but the spec doesn't say that." If we're not going to make any progress to reconciling a viewpoint of some description, then the discussion is largely pointless, anyway.
James Burgess
+2  A: 

XHTML allows to use all those tools designed for XML. Among then, there is XSLT, embedding SVG, etc...

Pierre
SVG is nice in theory, but most webpages hit the MSIE problem when it comes to SVG. (And work is going on to describe SVG in HTML5)
David Dorward
I wouldn't say allows, but makes it easier. HTML allows SVG via <object> and data: URI (not as pretty, but possible). XSLT can output HTML and there are tools that can parse HTML and pass it to XSLT processor (e.g in PHP you just need to change loadXML() to loadHTML() and it all works).
porneL
SVG in a data URI on Gecko doesn't allow any styles (bug 308590), for one, which makes it kind of a non-starter.
Ken
A: 

Use XHTML

  • Fails fast. If there are any inconsistencies they will be found during validation.
  • It encourages better design by separating semantic markup from presentation etc.
  • It's structured which means that you can treat it as a data object and run all sorts of queries against it. For example you could find all addresses or citations within your website.
  • You can do build-time optimizations. Since it's well-formed XML you can easily do find/replace operations during build time. Or any document management and manipulation.
  • You can write XSLT or other transformation scripts to programatically transform your XHTML for other platforms. For example you could have an XSLT for the iPhone that would transform all XHTML to make it compatible or more user-friendly for the iPhone
  • You are future proofing yourself. Transforming XHTML to newer semantics is again, very easy using transformation.
  • Search engines will continue to evolve to gather more semantic information as part of the programmable web.
  • DOM operations are more reliable since it's structured.
  • From an algorithmic perspective, it yields easier and faster parsing.
aleemb
If you validate HTML you will find those inconsistencies too. Separating structure and presentation is better expressed with Transitional/Strict not HTML/XHTML. HTML is structured too. You can do those operations with HTML too (you just can't use an XML parser). XHTML is not more structured. From an algorithmic perspective, you have to serve it as text/html if you want it to work in MSIE so browsers get no benefit.
David Dorward
@David: You can run queries and do operations on flat file logs (have a look at log parser) but that doesn't mean it's well suited to the task. XML has a wide variety of applications and the tools ecosystem is much richer. For example, if you are using NAnt you could use XPath to get partial trees from one XHTML file and inject them into another at build time.MSIE issues should not mean that XHTML is at fault. As I mentioned, with XHTML you have some future proofing as browsers improve. In either case it doesn't end with browsers. There's the programmable web and semantic search engines.
aleemb
validation != check for well-formedness. Show me where XHTML spec says to separate semantics from presentation. Show me where XHTML spec defines structure other than that in HTML4. With help of HTML/SGML parser, XSLT can read and write HTML. XHTML does not introduce new semantics. Google parses XHTML as if it were HTML. RDFa works in HTML. W3C plans to use HTML5 for next 20 years. Most JS libraries don't work with XML DOM. XML with namespaces is PITA to parse.
porneL
@porenL, some interesting points albiet you are being pedantic and need to open to the pragmatic scenarios. For starts XHTML is XML which means it's a data structure. If the implications of that are not clear to you right away, consider a simple scenario where you could just send an XHTML fragement to the server which could just accept it as an XmlFragment object, pass it around, maybe even persist it directly to the database or serialize it back to the server after any changes. All of which is lost to you because you are caught up in a pedantic debate about the spec (cont...)
aleemb
(...cont) I mentioned tools like nAnt which have support XPath for which XHTML is very useful--I don't have the time or inclination to make it work with HTML. That's just one tool and one scenario, there are plenty of others. Since it's a data structure, mining it is also much easier. I agree with you on the namespaces issue but I have not run into issues with JS libraries.
aleemb
I agree that it might be useful for working with fragments and data extraction from well-formed documents. If you're already using XML toolchain, then that's exactly where XHTML shines. By sending fragments, do you mean XMLHTTPRequest? That works with XML even when whole document is HTML.
porneL
porneL
I have been using and advocating XHTML for few years, but eventually gave up on it because of pragmatic reasons: I couldn't escape supporting IE (even though I hate it). I've found that largest mobile ISP in my country inserts ill-formed tags to XHTML documents. Ran into problems with text/html and properly serialized XML (<script/>, <div/>, etc.), ran into lots of problems with non-serializing templating languages. Found that I cannot parse XHTML with entities without setting up XML Catalog. Discovered that my framework and database accept invalid UTF-8 which breaks site for users.
porneL
A: 

XHTMl is a good standing point to use because if you want valid code you would need to provide some aspect of help to the disabled community due to the fact screen readers need the alt and title parts of the image and link tags. It must be faster to parse to an extent because unlike HTML the parser wouldn't need to check to see if the tag wasn't closed properly, if it was nested correctly etc. Also it is better to use it because yes it is strict but it helps you to think more logically (in my opinion) when it comes to learning programming languages.

Marc Towler
XHTML 1.0, 1,1 and HTML 4.01 have the same requirements and rules when it comes to alt and title attributes. XHTML is theoretically faster to parse if treated as XML, but since IE doesn't support that you have to serve it as text/html to get it to work, so that benefit isn't realised.
David Dorward
A lot of screen reader users use IE (mostly because for very long time no other browser worked well with them), and IE does not support XHTML at all (it may seem otherwise when you don't send proper MIME type). If you've really used XHTML, you'd actually make page completely inaccessible to majority of screen reader users.
porneL
A: 

I believe XHTML is (or should be) faster to parse. A valid XHTML document must be written to a stricter spec in that errors are fatal when parsing, whereas HTML is more lenient and allows for oddities mentioned before my comment like out of order closing tags and such. I found this helpful in uncovering the differences between HTML and XHTML parsing:

http://wiki.whatwg.org/wiki/HTML_vs._XHTML#Parsing

A reason you might use XHTML over HTML might be if you intend to have mobile users as part of your audience. If I recall, many phones use something more of an XML parser, rather than an HTML one to display the web. If you are writing for desktop browsers, HTML would probably be acceptable.

That said, if you are going to serve the data as text/html anyway, you should use HTML:

http://www.hixie.ch/advocacy/xhtml

cm2
HTML doesn't "allow" closing tags to be out of order. It just doesn't require that parsers throw an error - and I've never owned a phone which could handle XHTML but not HTML (and I've been using mobile Internet for many years)
David Dorward
+8  A: 

I'm surprised that all the answers here recommend XHTML over HTML. I am firmly of the opposite opinion - you should not use XHTML, for the foreseeable future. Here's why:

  • No browser interprets XHTML as XHTML unless you serve it as mimetype application/xhtml+xml. If you just serve it with the default mimetype, all browsers will interpret it as HTML - eg, accepting unclosed or improperly nested elements.

  • However, you should never actually do this, as Internet Explorer does not recognise application/xhtml+xml, and would fail to render the page completely.

  • There are significant differences in the DOM between XHTML and HTML. Since all so-called XHTML pages are being served as HTML at the moment, all javascript code is written using the HTML DOM. If, support for the XHTML mimetype becomes significant enough to convince people to start using it, most of their javascript code will break - even if they think their pages validate as XHTML.

Daniel Roseman
This is hardly a point... as if no-one ever uses it, how is it ever going to become an accepted standard? There are plenty of work-arounds in place, just as there are plenty for huge numbers of the web's emerging technologies (heck, even PNGs with transparency still need a work-around in IE).
James Burgess
+6  A: 

Use HTML (HTML4 Strict or HTML5).

  • HTML can fully utilize CSS, can be validated and parsed unambiguously. Separation of structure and presentation has been done in HTML4 and XHTML merely continued that.

  • All browsers support HTML. Only some browsers support XHTML and those that do, often have more mature and better tested and optimized support for HTML (it's caused by the fact that tiny fraction of pages uses XML mode).

  • If you care about IE and Google, you have to use HTML or subset of XHTML and HTML defined in Appendix C of XHTML spec. The latter is almost worst of the both worlds, because such XHTML cannot be generated with standard XML tools, cannot use extension mechanisms new to XHTML and has additional limitations over those in HTML alone.

  • XHTML1.0 is now over 10 years old, it was designed in "Web1.0" times, and as head of W3C said, in retrospect it didn't work out and better approach is needed. W3C HTML5 is written as we speak and addresses needs of web applications used today, and has very good backwards compatibility.

  • HTML5 closes many gaps that were between HTML4 and XHTML1 (e.g. adds inline SVG, MathML i RDF), cleans up language beyond what was done in XHTML1.0 and XHTML1.1.

  • XHTML2 is not going to be supported by web browsers in forseeable future. It's likely that it will never be supported (all browser vendors heavily support [X]HTML5, some have already declared that they won't implement XHTML2).


XHTML1.0 has exactly the same semantics and separation of presentation from structure as HTML4.01. Anybody who says otherwise, hasn't read the specification. I encourage everybody to read the spec – it's suprisingly short and uninteresting.

  • Stylesheets were introduced in HTML4.01 and were not changed in XHTML1.0.
  • Presentational elements were deprecated in HTML4.01 and were not removed in XHTML1.0.

XHTML myths.


There are no untractable differences in HTML and XHTML that would make parsing of one much slower than another. It depends how the parser is implemented.

  • Both SGML and XML parsers need to load and parse entire DTD in order to understand entities. This alone is usually more work than parsing of the document itself. HTML parsers almost always "cheat" and use hardcoded entities and element information. XHTML parsers in browsers cheat too.
  • Parsing of HTML requires handling of implied start and end tags, and real-world HTML requires additional work to handle misplaced tags.
  • Proper parsing of XHTML requires tracking of XML namespaces.
  • Draconian XML rules require checking if every character is properly encoded. HTML parsers may get away with this, but OTOH they need to look for <meta>.

The overall difference in cost of parsing is tiny compared to time it takes to download document, build DOM, run scripts, apply CSS and all other things browsers have to do.

porneL
very good answer
kemp
+3  A: 

Instead of continuing to debate HTML 4.01 Strict vs XHTML Strict, I would suggest starting to use HTML 5 today. John Resig, the author of jquery, made a similar suggestion last year on his blog.

The HTML 5 doctype, in it's beautiful simplicity will trigger standards mode in all browsers (including IE6).

<!DOCTYPE html>

That's it.

HTML 5 provides some exciting new features such as the <canvas> tag which potentially can push javascript application development to the next level. HTML 5 also has proper support for media (and media is a fairly important aspect of the web these days!) in the form of <video> and <audio> tags.

If you like the syntax of XHTML, i.e. closing "empty" tags such as <br />, that is fully supported in HTML 5. From Karl Dubost of the W3C's post Learn How To Write HTML 5:

auto-closing tag is allowed and conformant in HTML 5.

XHTML2 has received relatively little attention compared to HTML 5. It's becoming increasingly clear that HTML 5 is the future of markup on the web. Microsoft's latest browser, IE8 still renders XHTML served as text/xml as text/html.

Microsoft have a co-chair on the W3C HTML working group and there's an implied support from them for HTML 5. All of the browser vendors have publicly announced their support for HTML 5.

At the end of the day, even if XHTML2 regains support from the industry, it won't be a significant issue having two competing standards as it has been in the past. Both languages support XML namespaces (in the case of HTML 5, serialization of HTML i.e. DOCTYPE switching).

Bayard Randel
Can't help but agree that HTML5 does seem to be a very promising standard. I do hope that it fares better than the XHTML2 specification did.
James Burgess
Still, not closing tags makes me feel dirty.
WebDevHobo
@WebDevHobo, you can close your tags and they will still validate correctly against HTML 5 - it's part of the spec. I prefer this myself too (see update on post for citation).
Bayard Randel
+29  A: 

You should read Beware of XHTML, which is an informative article that warns about some of the pitfalls of XHTML over HTML.

I was pretty gun ho about XHTML until I read it, but it does make several valid points. Including the following bit;

XHTML 1.x is not “future-compatible”. XHTML 2, currently in the drafting stages, is not backwards-compatible with XHTML 1.x. XHTML 2 will have lots of major changes to the way documents are written and structured, and even if you already have your site written in XHTML 1.1, a complete site rewrite will usually be necessary in order to convert it to proper XHTML 2. A simple XSL transformation will not be sufficient in most cases, because some semantics won't translate properly.

HTML 4.01 is actually more future-compatible. A valid HTML 4.01 document written to modern support levels will be valid HTML 5, and HTML 5 is where the majority of attention is from browser developers and the W3C.

Future compatibility can be huge when working on some projects. The article goes on to make several other good points, but I think that may have stood out the most for me.

Don't mistake the article for a rant against XHTML, the author does talk about the good points of XHTML, but it is good to be aware of the shortcomings before you dive in.

James McMahon
A nice article and props to you for showing it to us.
WebDevHobo
XHTML 1.x is “future-compatible”. The charter for the XHTML 2 working group has now expired. - It has now been superseded by XHTML5
Casebash
+1  A: 

Interesting development: XHTML 2 Working Group Expected to Stop Work End of 2009, W3C to Increase Resources on HTML 5

2009-07-02: Today the Director announces that when the XHTML 2 Working Group charter expires as scheduled at the end of 2009, the charter will not be renewed. By doing so, and by increasing resources in the Working Group, W3C hopes to accelerate the progress of HTML 5 and clarify W3C's position regarding the future of HTML. A FAQ answers questions about the future of deliverables of the XHTML 2 Working Group, and the status of various discussions related to HTML. Learn more about the HTML Activity.

Well, I guess that makes the future of HTML pretty clear.

Alec