views:

38

answers:

3

If the content type and character set are declared in the PHP header, is there a reason to have them again in the usual HTML DTD?

<?php ob_start( 'ob_gzhandler' );
 header('Content-type: text/html; charset=utf-8'); ?> // here
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"&gt;
<html xmlns="http://www.w3.org/1999/xhtml"&gt;
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> // and here
...
+5  A: 

If you are sending the charset in the headers, the is no need to repeat it in the HTML markup.

It is better to send this information in one place (DRY principle), as if the charsets conflict (ie. a header with UTF-8 and a meta with iso-8859-1), the browser will probably go to quirks mode.

Having said that, some automated tools (web scrapers) may not look at the header and deduce the page encoding only by the meta tag.

It is important to keep both the header and meta tag the same for each page - mixing different charsets may confuse browsers and cause display issues.

Oded
Thanks for the quick answer! If that's the case, then what is the proper markup based on the above? Can I remove that whole `<meta>` line?
Isaac Lubow
@Isaac Lubow - The `meta` tag can be removed, as it only restates the charset from the header.
Oded
+2  A: 

Having the charset in the HTML source may be helpful if someone decides to save a page, or for web scrappers :). libxml looks up the meta tag to determine the charset to use when parsing the markup. Show your fellow developers some web scrapping love.

Ionuț G. Stan
+1 Hey, good point! ` `
Pekka
A: 

If you declare it in the HTTP headers, then it will survive transcoding by proxies and won't ever trigger a "Whoops, I guessed the wrong encoding, restart parsing from top" situation in browsers.

If you declare it in the body of the document then it will survive being access outside of HTTP (or another system with content-type headers, such as email).

If you declare it in both then you get the best of both worlds so long as no transcoding happens.

Note that if you don't use UTF-8 or UTF-16 then the XML spec requires that you specify it in the XML prolog (and that using an XML prolog will trigger Quirks mode in IE6).

David Dorward