How can I know what encoding will be used by PHP when sending data to the browser? I.e. with the Cotent-Type header, for instance: iso-8859-1.
You can set your own with header('Content-type: xxx/yyy');
, but I believe that text/html is sent by default.
AFAIK, PHP sends strings bytewise. that is, if your variables hold UTF-8, it will send UTF-8. if you have iso-8859-1, it will send that too. if you mix them, it won't be pretty.
You can use the header() solution that William suggested, however if you are running Apache, and the Apache config is using a default charset, that will win everytime (Internet Explorer will go crazy) See: AddDefaultCharset
Keep in mind that content-types and encodings are two different things. text/html is a content-type; ISO-8859-1 and UTF-8 are encodings.
The HTTP response header that the server sends typically looks like this:
Content-Type: text/html; charset=utf-8
"charset" is actually the character encoding. It's not in a separate header; however there is a header called "Content-Encoding" which actually specifies what kind of compression the response uses (e.g. gzip).
If you want to change the character encoding to UTF-8, in a file that contains HTML:
<?
header("Content-Type: text/html; charset=utf-8");
If your server is not configured to have a default content or charset, and neither is PHP, PHP will send only Content-Type: text/html
- it won't specify a charset at all, and will send the bytes as it sees them in the script.
If a browser receives a page without charset specified, various things can happen:
- most browsers have an "Encoding/Charset" menu; if the user explicitly selects one, the browser will try to apply it. Doesn't happen too often, so:
- some browsers try to render it with a default charset (which is locale-dependent, e.g. for FF and cs_CZ it used to be
iso-8859-2
; YMMV) - IE will try to determine the charset heuristically (it will take a guess, based on character distribution - and many times it gets it right; sometimes it gets it wrong and you get a page in Romanian interpreted as Chinese text, which usually means "unreadable")
- some old browsers will fall back on
us-ascii
If with this procedure, the PHP script's charset and the browser's charset matches, the text will - accidentally - be readable. If not, there will be weird signs and similar phenomena.