views:

895

answers:

16

In an interview I was asked a question that I'd never thought about, which was "We already have HTML which fulfills all the requirements of writing a web page, so what's the need for XHTML?"

I Googled a lot and also read many articles, but I'm not able to get properly why XHTML has been introduced. Please explain to me.

+33  A: 

Because it is valid XML. That helps a lot since you can use a lot of tools originally designed for XML, such as XML parsers, XSLT, XPath, XQuery, ...

Normal HTML is a SGML dialect and that is not parsable without knowledge of the schema.

<ul>
    <li>one
    <li>two
    <li>three
</ul>

is correct HTML but not correct XML. If you want to parse that, you have to know that ul-elements have to be closed but li s don't.

Johannes Weiß
And it is XML based cause we can share the XML data, platform independently. right? If its wrong then why we are more concern about combining HTML with XML ??
Prashant
i think this says what it is, but not why it's a good thing; being able to use xml tools on your html isn't a very valid reason
jcollum
Andrew Grant
Ok, then what's the valid and appropriate reason? I wann to know that?
Prashant
Adding XML to HTML provides the data validation and transformation capabilities of XML to HTML. It also allows for more reliable parsing, query, and rendering.
Dave Swersky
Okies... That's fine, lot of answers to this question with many points, But guys now just tell me If i want to answer to interviewer, then what will be the short, simple and convincing answer?
Prashant
I'd say: "It's XML and therefore very easy to determine if the syntax is correct (without knowing the schema). (for humans AND for computers)"
Johannes Weiß
@Andrew Grant: I don't recall ever running an xml tool on generated html. Why would I? When I could just change the thing that's generating the html?
jcollum
@Andrew Grant: OK, I can see some applications for screen scraping. But I doubt people started XHTML so I could screen scrape better.
jcollum
Using XHTML lets you use XML toolchains to build the XHTML more easily in the first place. My website uses an XML templating engine to produce XHTML, and the internal/intermediate representations are completely handleable with standard XML tools. At the final output generation stage, if the web browser does not support XHTML properly, it converts it to HTML4, but it's still all XML in the back end. Using standard XML tools to get to the final output is the big benefit IMO.
Michael E
(non-X)HTML *as implemented* isn't even a very good SGML dialect -- just look at how browsers handle `<tag/>` and compare it with what SGML specifies, or look at what SGML allows for comments and then compare it to what browsers actually do with HTML comments. They're basically irreconcilable.
hobbs
+8  A: 

I am sure you mustve encountered this article from W3.There is a lot to learn from that article. In short XHTML abides the xml rules besides having HTML set of tags. The Most Important Differences:

* XHTML elements must be properly nested
* XHTML elements must always be closed
* XHTML elements must be in lowercase
* XHTML documents must have one root element
Perpetualcoder
finally a question i can answer, but I'm too late
DrG
answers what, but not why, -1
jcollum
@jcollum- Section 1.3 in the link I provided tells you why you need XHTML.
Perpetualcoder
+2  A: 

I think that it helps browsers correctly display the html without making assumptions about where tags should be closed. Any time a browsers assumes something you know what happens.

jcollum
that's not true. HTML is well defined even if you don't close the tags. Some tags have to be closed, some not. The sample from http://www.w3.org/TR/html4/struct/lists.html is correct HTML! The browser KNOWS, that li-tags cannot be nested.
Johannes Weiß
the right thing about that is, that most people didn't understand which tags have to be closed and which not. So the browsers started to guess what you mean. The XML-rule that everything has to be closed is better because it's simpler. But the browsers could have enforced the HTML rule, too!
Johannes Weiß
+17  A: 

In addition to Johannes answer, HTML is far too loose in its interpretations and tolerance, where XHTML's strict formalisation negates this.

Tolerance leads to variance, which leads to browser incompatibilities, which leads to the dark side.

annakata
Tolerance is very useful. Strict adhereance to a XTHML DTD or Schema takes the X __out__ of XML. I for one am hoping to stave off the straight jacket of application/xthml+xml for as long as humanly possible.
AnthonyWJones
Not here. A well defined child implementation of XML should remain standard. What the hell is a browser supposed to do with your <myTag>?
annakata
The dark side stronger is not. Quicker, easier, more seductive. - Yoda
Treb
@annakata: true but <tr rowID="12" /> is very useful, the browser doesn't need to do anything with it but my Javascript libraries find it useful
AnthonyWJones
I think this ship has already sailed. Yeah, supporting bad HTML leadsto a lot of browser bloat, but all the browsers already support it,and will for the foreseeable future.
jrockway
(goddammit I keep tripping on "it's" in my typing fury - thanks diodeus)
annakata
+5  A: 

XHTML is an attempt to encourage the development of "well-formed" HTML.

HTML has evolved over more than 10 years. Its implementation, and the implementation of the browsers that parse and render it, are not exactly consistent. This is why cross-browser compatibility is a major headache.

HTML is based on SGML (Standard Generalized Markup Language.) XML is also derived from SGML, so they are cousins of a sort. XHTML marries the two, providing (in theory) the benefits of XML to HTML. This includes a well-defined schema that can be reliably validated, queried, and transformed.

Dave Swersky
+21  A: 

XHTML also allows you to embed other XML dialects like MathML, Ruby, SVG, etc. (You can also embed XHTML in other XML dialects, if desired.)

If you are just 'making a web page', you don't necessarily need XHTML. But if you are programmatically generating a page, you might find that the tools for generating XML are better than those that generate HTML.

jrockway
perfect explanation about the role of XML? Thanks! voted up for you :)
Prashant
+13  A: 

From Wiki:

Because they need to be well-formed, true XHTML documents allow for automated processing to be performed using standard XML tools—unlike HTML, which requires a relatively complex, lenient, and generally custom parser. XHTML can be thought of as the intersection of HTML and XML in many respects, since it is a reformulation of HTML in XML.

Having HTML conform to XML standards allows for a much more consistent parsing of the page. Whereas in HTML, for example, you were allowed to have tags out of order <b><u>test</b></u> now you can't, they must be closed in the order they were opened. Things like this make DOM parsing (which is now used heavily in AJAX) much easier.

Parrots
IOW the major benefit is to the browser itself rather than the web developer. Web developers may benefit but typically in a secondary sense.
AnthonyWJones
Having said that +1 becuase its the _real_ motive behind XHTML.
AnthonyWJones
+18  A: 

I am actually writing this to ask why the above three posts which speak about browser-consistence and well formed html have been voted down?

As it is known HTML is a industry standard. Browsers are implemented so that they render the marked up content as described in the HTML standard. Unfortunately there are areas that have not been well defined in HTML: what happens if user forgot a closing tag or what to do if a referred image is not found? some browsers use the 'alt' tag to have a place holder text item and some browsers display the 'alt' tag as a tool tip. The famous 'quirks' mode of browsers is a result of this lack of clarity. Because of this, it became quite possible that the same web page would display differently on different browsers.

Also as HTML usage grew there was one more problem: it was not extensible - there was no way to add user-defined tags.

XHTML solves the above problems:

  • adopt XML to provide extensible tags.
  • provide a 'strict' standard for web browsers

XHTML has well defined rules about the structure and these can be programatically enforced. Check the various online "XHTML Validators". They will tell if your XHTML is well formed or not (and highlight the problem areas). Because of these strict rules your page is more or less guaranteed to look the same on all browsers implementing XHTML.

[note] if you want to verify the above, please refer to the text "Head First XHTML and CSS"

Sesh
Not getting this line "Because of these strict rules your page is more or less guaranteed to look the same on all browsers implementing XHTML." more or less ????
Prashant
@Prashant: If a browser says it supports XHTML, it would first validate the xhtml before starting to render. If you pass an invalid xhtml page to such a browser, you would see a 'invalid format' kind of message. You don't see this because most websites don't bother so the browsers become lenient.
Sesh
Yups, you're right, even we don't care about validating our web-app's markup. Thanks for the explanation!
Prashant
I asked this in the first answer of this question, but I am asking you here. "just tell me If i want to answer to interviewer, then what will be the short, simple and convincing answer to the question "Why we need XHTML?"
Prashant
Or let say I am the interviewer and asking this question to you, then what will be your answer?
Prashant
My answer would be "provide extensible tags and to ensure page displays same in all browsers". Like I said do read the first chapter in that easy book (head first xhtml and css).
Sesh
@Sesh - Thanks for helping me to understand the concept :)
Prashant
XHTML doesn't promise anything about displaying the same across browsers. That is the job of CSS...
Andrew Vit
@Andrew: I was talking about avoiding the quirks mode where each browser can decide what it wants to do.
Sesh
Quirks mode can be easily avoided in HTML as well by just including <!DOCTYPE HTML> at the top. That triggers full standards mode in all browsers.
Joeri Sebrechts
“XHTML solves the above problems” — in theory, yeah. But in practice (which means on the web, which means 6 billion monkeys — that’s us — bashing away at > 6 billion keyboards), it didn’t solve those problems.
Paul D. Waite
+2  A: 

XHTML forces you to write cleaner code which is easier to maintain, renders more consistently, and easier to hook into the DOM. Comparing XHTML to HTML is like comparing a programming language that is strongly-typed to a programming language that is loosely-typed.

As mentioned above, XHTML allows you to play with SVG and MathML. I'd like to add RDFa to that list. RDFa allows you to add semantics to your content that is not covered by microformats. I've personally been doing a lot with Dublin Core and Friend-of-a-Friend.

Scott
+1  A: 

XHTML is simply about communication between systems. HTML is very difficult to parse, because of the number of variations that can occur as to what is well formed. Since XML is strict in its interpretation, this problem has been removed.

Think about a RESTful architecture. If a URL is permanent location to an item, then systems which would want to access this item should be able to consume the information returned from accessing the URL. XHTML doesn't make this possible per se, because a system could already parse the HTML and retrieve the necessary information. XML just makes this easier. There is no limiting predefined set of tags which make it difficult to classify data in a document (althought techinically you can do this in HTML, because browsers will ignore it). You can use whatever you want to classify what data is retrieved.

Kevin
A: 

If i want to crawl your site, and parse its contents, i can only do it if it's XML.

Parsing HTML is a nightmare.

Ian Boyd
HMTL parsers should be available in almost any language.
Casebash
Point is: they're not trivial to write. And XML parser ships with my operating system already, has been widely tested, supported, and gets security updates. The HTML parser that i happen to find from *the guy on a web-site* doesn't have those features. And if i write an HTML parser: i'm going to follow the HTML spec.
Ian Boyd
A: 

XML is a data interchange format - this is perfect for building websites because after all we are dealing with information and this info needs to be crawled and understood by computers (such as search engines).

Matthew James Taylor
A: 

Because XHTML makes a lot more sense!

The point is, even though something might not provide any more technical possibilities, it's still an improvement if it's remade just to be more clear and logical. That's why code refactoring is a good idea even if it doesn't change any of the functionality. That's why Brainfuck wound't be a good programming language, even if it had all the capabilities of Java.

XHTML makes more sense because the underlying structure of tags and their attributes is always consistent - not dependent on the tag semantics. The way it makes more sense is pretty evident, once you get familiar with its difference to HTML, but for example tags are always orderly nested, all tags must close, names must be lowercase, attribute values must have limiting characters around them.

Ilari Kajaste
Not a very useful answer
Casebash
Well, true, not that useful. But I stand by it, and it has a point too. I'll try to elaborate on that.
Ilari Kajaste
+1  A: 

Why was XHTML created?

  • HTML is not very extensible. XHTML aimed to fix this by introducing namespaces so that languages such as MathML or SVG could be included inline.
  • XMl is much simpler to parse than SGML (the format used by HTML before version 5)
  • Due to an overwhelming number of websites with errors, browsers attempted to correct incorrect markup. New browsers have had to attempt to correct it in the same way. XHTML tries to increase standards by specifying that only structurally correct code will display.

How well has it succeeded?

  • XHTML is widely spread, but almost always served with the text/html MIME type due to incompatibilities with Internet Explorer (up to version 8). Many of these pages would actually break if served as XML. So none of the three advantages above have really materialised.
  • Many people chose to use XHTML as they thought it would provide better future compatibility. Work has stopped on XHTML2.0 and while HTML5 will have an XHTML serialisation, this seems to be receiving minimal attention. XHTML provides no future compatibility advantages for the forseeable future. Mozilla and Safari recommend using just HTML.
  • HTML with a strict DTD already has a much cleaner format. HTML5 will take this further by removing the transitional DTD, removing unnecessary elements and defining a standard way for parsing documents with a degree of backwards compatibility. Browsers will still correct errors for the HTML serialisation, rather than forcing the markup to be fixed, but at least they will do it in the same way. Those who care about correct code will use validators anyway.

What is the need for XHTML?

XHTML had laudable goals and maybe it will be able to deliver in the future. I can't recommend XHTML for the possible future advantages it might provide, when HTML is much easier now. You should only really use XHTML if previous code or your tools force you to.

Casebash
A: 

In a nut: XHTML is often only beneficial and preferred over HTML whenever you want to use a XML based tool to manipulate/transform/generate HTML pages on the server side.

Lot of examples can be found in component based MVC frameworks like Sun Oracle JSF which uses Facelets as a XHTML based view technology. The server side components are definied in XSD's and the pages are parsed using a SAX parser. You can even add a <!DOCTYPE html> to top of the page to let Facelets generate "pure" valid and strict HTML5. Microsoft ASP.NET MVC has a similar view technology.

When you're hand-writing HTML, XHTML doesn't add much benefit, or it must be pushing off the "coolness" of using a (over)hyped technology.

See also:

BalusC
A: 

I see a bunch of up-voted answers here that are making incorrect assumptions about how browsers work. So let me give my 2 cents on the matter.

First of all, why does XHTML exist?

From the horse's mouth:

a two-day workshop was organised to discuss whether a new version of HTML in XML was needed. The opinion at the workshop was a clear 'Yes': with an XML-based HTML other XML languages could include bits of XHTML, and XHTML documents could include bits of other markup languages. We could also take advantage of the redesign to clean up some of the more untidy parts of HTML, and add some new needed functionality, like better forms.

In short, XHTML was created for two reasons:

  • To allow mixing other content (like mathml and svg) in the same document with clear formatting rules.
  • To extend and clean up HTML.

Making things easier to validate was not a design goal, and also not something that was necessary because HTML4 validators exist and are comprehensive.

Is XHTML easier to parse for browsers?

Yes and no. XML is easier to parse than HTML tag soup, but, unless you use an xhtml+xml or application/xml mime type for your XHTML page, browsers parse it using the HTML parsing engine. However, if you do use xml mime types, IE chokes on your content. This behavior is explained on the IE blog. There is no difference in how browsers treat XHTML and HTML if you are serving it with a mime type of text/html!

Yes they do! You lie!

Indeed they do, but only because of the doctype. Browsers use doctypes at the top of HTML documents to determine whether they should use standards mode or quirks mode (= bugs mode). All valid XHTML documents happen to include a doctype that triggers standards mode. However, in HTML you can get the same result by including "<!doctype html>" at the top of your page.

So are you saying XHTML has no purpose?

Not at all. XHTML has many advantages:

  • It can be transformed using XML tools, like XSLT
  • It can be parsed more easily in server-side code
  • It can integrate custom markup while still passing a validation test

So, I should use it then?

As always, the answer is "it depends".

  • Server-side, possibly useful. If you want to have the server-side advantages of XML, you want to be using an XHTML variant, whether that is XHTML1 (HTML4 serialization as XML) or XHTML5 (HTML5 serialization as XML).
  • Client-side, not useful. I would highly recommend avoiding serving your users an XML mime type. XML parsing doesn't blend with graceful error handling, producing only an "XML parsing error" instead of a document if you have any markup issue in your page. Unless you never write bugs, you will need graceful error handling.

What about HTML5? Does it compete with XHTML?

No it doesn't. HTML5 has two serializations, one as HTML, and one as XML. The benefit is that both now have strict parsing rules. You will get predictable behavior in all browsers regardless of the approach you use. However, HTML5 parsed as HTML has the benefit of graceful error handling. That's why I prefer that approach. As always, YMMV.

Joeri Sebrechts