views:

213

answers:

3

I know the purpose of DOCTYPE (and what each url/identifier on the line is) as far as web standards and page validation goes, but I am unsure about what it actually "is" in the context of an XML document.

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd"&gt;
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">
  <head>
    <title>My Page</title>
  </head>
  <body>
    <p>Hello</p>
  </body>
</html>

Is it part of the actual XML document structure, or is it some kind of comment-like "hint" that is noted then stripped?

What is the significance of the "!" before the name? Does this denote a special type of "element"? What are they called?

The example I posted is XHTML for the web, but is DOCTYPE also used in general purpose XML documents?

A: 

This isn't an answer, but it reminds me of Joel's article Martian Headsets :

DOCTYPE is a myth.

A mortal web designer who attaches a DOCTYPE tag to their web page saying, “this is standard HTML,” is committing an act of hubris. There is no way they know that. All they are really saying is that the page was meant to be standard HTML. All they really know is that they tested it with IE, Firefox, maybe Opera and Safari, and it seems to work. Or, they copied the DOCTYPE tag out of a book and don’t know what it means.

Jeffrey Knight
Boo - I'm trying to take the time to understand what I'm typing :(I'm not fussed about real-world browser peculiarities at this point.
frou
That downvote wasn't me, BTW!
frou
+5  A: 

DOCTYPE has been "inherited" from SGML (it was supposed to point to DTD file that explains how to parse the file), however self-explanatory XML syntax and namespaces made it largely irrelevant. The only real use for DOCTYPE/DTD in XML is to define allowed named entities (e.g. &nbsp;).

XML spec even allows "non-validating" parsers that ignore DTD file completely (web browsers use such parsers, unless you've fallen into the text/html trap in which case XML parser is not used at all).

DTD is quite poor for purpose of validation (hard to specify rules for more than one level of nesting, no way to specify types of attributes beyond few predefined types). Schema, RelaxNG can be far more precise.

DTD doesn't fully suppport namespaces either, which leads to ridiculous workarounds like XHTMLplusMathMLplusSVG DOCTYPE.

In web browsers certain DOCTYPEs have desirable side-effect of triggering standards-compliant rendering mode. This is more of a hack than intended use DOCTYPEs.

  • If you're using real XHTML (application/xhtml+xml – the one that doesn't open in IE at all), then don't use DOCTYPE at all (that's recommendation from XHTML 5). XML mode will trigger standards-compliant rendering regardless of DOCTYPE.

  • If you're using text/html mode, then use <!DOCTYPE html>. That's HTML 5 DOCTYPE and it's a shortest one that triggers best possible rendering in all browsers. Browsers don't use DOCTYPE for any other purpose, so you're not missing out on anything.

  • If you're processing XHTML files with XML parsers (outside browsers), then please don't forget to set up DTD Catalog properly, otherwise your parser may be DoS-ing w3.org trying to fetch DTD every time. If you can't use DTD catalog, then disable "externals" in the parser or omit DOCTYPE and don't use named entities (i.e. use &#160; rather than &nbsp;)

porneL
Real XHTML == application/xhtml+xml? Today's XHTML is still 1.1 and thus requires the DOCTYPE, right?
frou
@frou: yes, XHTML is treated as XML only in application/xhtml+xml. "Strictly conforming" XHTML 1.1 requires DOCTYPE indeed.
porneL
+2  A: 

DOCTYPE is part of the XML specification (see the relevant subsection here) and can include either a link to a DTD, "internal" DTD declarations, or both. Many "modern" uses of XML don't use a DOCTYPE at all, though - as porneL mentions, both XML Schema and RelaxNG are more powerful ways to specify a document's syntax. See this Tim Bray blog post for a bit more background.

Greg Campbell
It is part of XML because it is part of SGML, and XML is a form of SGML.
John Saunders
I thought there was some major overlap (I've used XSD in non-web stuff)
frou