views:

342

answers:

3

many doctype use a url link

like this

  <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd"&gt;

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"&gt;

and this dtd file is on live url http://www.w3.org/TR/html4/strict.dtd

What is the use of this online live dtd and how any page (which use this doctype) will render properly according to this doctype without having access to this url (i mean if internet access is not available?)

update : I found this info from wikipedia http://en.wikipedia.org/wiki/System_identifier

In HTML and XML, a system identifier is a fragmentless URI reference. It typically occurs in a Document Type Declaration. In this context, it is intended to identify a document type which is used exclusively in one application, whereas a public identifier is meant to identify a document type that may span more than one application.

In the following example, the system identifier is the text contained within quotes:

update 2 : is it only to use for Validators? how some software like dreamweaver provides offline validation?

update 3: i found this info from w3c site http://www.w3.org/QA/Tips/Doctype

Why specify a doctype? Because it defines which version of (X)HTML your document is actually using (version for what browser or validator?), and this is a critical piece of information needed by some tools (which tools? any other tools then validator?) processing the document.

For example, specifying the doctype of your document allows you to use tools such as the Markup Validator to check the syntax of your (X)HTML. Such tools won't be able to work if they do not know what kind of document you are using.

But the most important thing is that with most families of browsers, a doctype declaration will make a lot of guessing unnecessary, and will thus trigger a "standard" rendering mode.

+12  A: 

No, no browsers actually fetch or validate against the doctype. See DTDs Don't Work on the Web for a good argument for why fetching and validating DTDs is a bad idea.

The doctype is there, in theory, to tell what version of the standard the document uses. The browsers generally don't use this information, other than to switch between quirks and standards mode. All modern browsers accept the simplest possible doctype, with no URL or version information, <!DOCTYPE html>, for this purpose; because of this, HTML5 has adopted this as the recommended doctype.

Validators sometimes use this information to tell what DTD to validate against, but DTDs embedded in the document aren't actually a very good way of specifying validation information. The problem with validating against a DTD referenced within a document is that the consumer of that document doesn't really care all that much whether the document is self-consistent, but whether it follows a schema that the consumer knows how to interpret reliably. Instead, it's generally better to validate against an external schema, in a more powerful schema language like RELAX NG.

When validators use this information, they frequently use the URI as an identifier only, not as a locator. That means that the validator already knows about all of the common HTML doctypes, and uses that knowledge for validation, instead of downloading from the URI referred to. This is in part to avoid the problem of having to download the DTD every time, and also because a DTD doesn't actually specify enough information to provide very good validation and error messages, so some parts of the validator may be specified in custom code or a more powerful schema language. For more information, see Henri Sivonen's thesis on his implementation of the validator.nu HTML5 conformance checker.

Some validators may also download and then cache DTDs, so they would need to be online once to download it, but will later work from the cached version.

Brian Campbell
see my update 2
metal-gear-solid
but how any browser without internet connection change quirks to strict mode if xhtml file has this system identifier
metal-gear-solid
Browsers do not actually load the DTD referenced in order to switch between quirks and standards mode; they just do a string match on the `<!DOCTYPE ...>` declaration. No (popular, modern) browser will ever actually load the DTD referenced. Most validators do not either. They just use it as an identifier. Note that many of the doctypes listed in the second link in my answer don't even have a URI, they just use FPIs (formal public identifiers http://en.wikipedia.org/wiki/Formal_Public_Identifier) like "-//W3C//DTD HTML 4.01//EN".
Brian Campbell
The use of the systemId as a quirks mode signifier is a browser hack that has no relation to the proper purpose of a systemId in locating the DTD external subset. Browsers chose to do this check because people who got the systemId wrong were the people who needed the quirks, but there's nothing magic about systemIds themselves.
bobince
+2  A: 

The URI is there to identify the document type uniquely - it is not meant for retrieval and no browser (or other piece of software) should rely on a document existing at that web address.

Oded
It's really more of a URI than a URL. Its purpose is to identify the document type - that this also works to actually locate it is incidental.
fennec
Thanks for the correction - very observant. Fixed.
Oded
It's very much meant for retrieval. An external-entity-including parser such as a validating parser (which a browser isn't) will need the DTD external subset; if it does not have a local copy it will have to dereference the system ID as a URI. You would hope most tools would have a local copy of the well-known HTML DTDs accessible under their public ID though.
bobince
It may be *meant* for retrieval, but that doesn't mean it's a good idea to rely on that. See the RSS 0.91 fiasco for why. If your application will fail to work because the user is offline, the server hosting the DTD is offline, or there is some sort of connectivity problem, then your application is broken. External entities are a bad idea, in SGML, in XML, or in HTML, which is why browsers made the right decision and don't retrieve external DTDs. Again, see http://hsivonen.iki.fi/no-dtd/
Brian Campbell
A: 

I used to wonder about that myself. But if you have your own HTTP server, it's pretty easy to prove that it doesn't matter. Just yank the cable to the outside world and see if you can still open the pages on your server.

John Knoeller
Heck, just yank the cable and look at a saved HTML page on your hard disk.
fennec
we don't need any server to prove for just a html file
metal-gear-solid
Depends on how much you trust the browser to behave the same for local files as it does for files it pulled down from the web.
John Knoeller