views:

770

answers:

5

How can I know what encoding will be used by PHP when sending data to the browser? I.e. with the Cotent-Type header, for instance: iso-8859-1.

A: 

You can set your own with header('Content-type: xxx/yyy');, but I believe that text/html is sent by default.

William Keller
I don't want to set it, I want to know what default content-type will be set, if I don't set one.
Alessandro Vernet
A: 

AFAIK, PHP sends strings bytewise. that is, if your variables hold UTF-8, it will send UTF-8. if you have iso-8859-1, it will send that too. if you mix them, it won't be pretty.

Javier
Right - and I am using the mb_* string conversion function to convert my strings to the "encoding used by PHP by default". But to do the conversion correctly, I need to know the encoding that PHP will be using.
Alessandro Vernet
+1  A: 

You can use the header() solution that William suggested, however if you are running Apache, and the Apache config is using a default charset, that will win everytime (Internet Explorer will go crazy) See: AddDefaultCharset

jasonbar
+1  A: 

Keep in mind that content-types and encodings are two different things. text/html is a content-type; ISO-8859-1 and UTF-8 are encodings.

The HTTP response header that the server sends typically looks like this:

Content-Type: text/html; charset=utf-8

"charset" is actually the character encoding. It's not in a separate header; however there is a header called "Content-Encoding" which actually specifies what kind of compression the response uses (e.g. gzip).

If you want to change the character encoding to UTF-8, in a file that contains HTML:

<?
header("Content-Type: text/html; charset=utf-8");
dirtside
Maybe my question wasn't clear: I don't want to set the encoding. I want to know encoding will be set by PHP if I don't set one. I want to know what the default encoding is.
Alessandro Vernet
A: 

If your server is not configured to have a default content or charset, and neither is PHP, PHP will send only Content-Type: text/html - it won't specify a charset at all, and will send the bytes as it sees them in the script.

If a browser receives a page without charset specified, various things can happen:

  • most browsers have an "Encoding/Charset" menu; if the user explicitly selects one, the browser will try to apply it. Doesn't happen too often, so:
  • some browsers try to render it with a default charset (which is locale-dependent, e.g. for FF and cs_CZ it used to be iso-8859-2; YMMV)
  • IE will try to determine the charset heuristically (it will take a guess, based on character distribution - and many times it gets it right; sometimes it gets it wrong and you get a page in Romanian interpreted as Chinese text, which usually means "unreadable")
  • some old browsers will fall back on us-ascii

If with this procedure, the PHP script's charset and the browser's charset matches, the text will - accidentally - be readable. If not, there will be weird signs and similar phenomena.

Piskvor