views:

209

answers:

5

I need to create an application in Php that can handle all unicode characters in all places - edit fields, static html, database. Can somebody tell me the complete list of all parameters / functions that need to be set / used to achieve this goal?

+4  A: 

Apache

The server encoding must be either not set, or set to UTF-8. This is done via the apache AddDefaultCharset directive. This can go to the virtualhost or the general file (see documentation).

AddDefaultCharset utf-8

MySql

  • Set the collation of the database to be UTF-8
  • Set the connection encoding. It can be done as someone said with mysqli_set_charset, or by sending this just after connecting:
    SET NAMES 'utf8' COLLATE 'utf8_unicode_ci'

PHP

1- You should set the HTML charset of the page to be UTF-8, via a meta tag on the page, or via a PHP header:

    <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
-or-
    header('Content-type: text/html; charset=utf-8');

2- You should always use the mb* version of string-related functions, for example, mbstrlen instead of strlen to get the string length of a string.

This should allow you to have UTF-8 everywhere, from the pages to the data. A test you can do: right-click anywhere on the page using firefox, and select Show page information. The effective encoding is listed in that page.

Palantir
Note: all pages of the text/* variety that you serve should be UTF-8, **including JavaScript files**. Some browsers have problems if the page is UTF-8 and JS files are not.
Piskvor
+1  A: 

Important: You should also ensure that you use UTF-8 as connection charset when connecting to Mysql from PHP!

For mysqli this is done by

mysqli_set_charset($dblink, 'utf-8')

http://de3.php.net/manual/en/mysqli.set-charset.php

Pesse
+1  A: 

Some things you will need to look into:-

PHP

Make sure your content is marked as utf-8 :

default_charset = "utf-8"

Install mbstring. You can find it here

Ensure that you are talking utf-8 between PHP and MySQL.
Call mysql_set_charset("utf8"); (or use the SQL query SET NAMES utf8)

Apache

You also set the Content-Type: of your pages in here with something like this

AddDefaultCharset utf-8

MySQL

Make sure all your tables use utf8 Collation utf8_general_ci; eg

ALTER DATABASE mydb CHARACTER SET utf8;

Finally

Finally, test stuff with fun unicode samples, like these ones

٩(͡๏̯͡๏)۶

More helpful information from when I tried this...

rikh
+1  A: 

You were recommended to use either a HTTP header or a meta element to set the charset on your pages to utf-8. The W3C recommends that you do both. And the meta element should appear as early as possible on the page. (All characters before the meta element should be ASCII, which is basically identical in almost all character encodings. Some browsers will restart page rendering when they encounter the meta tag, which is another good reason to have it early.)

Also, on all forms accepting user input put an accept-charset="utf-8" attribute. Generally browsers submitting POST data will default to the encoding of the page, but it's no harm to be sure.

TRiG
A: 

I used the mentioned methods and they worked fine. Until recently, when my provider has updated PHP to 5.2.11 and MySQL to 5.0.81-community. After this change the unicode characters were properly retrieved from the database, but all updates were corrupted and unicode characters were being replaced by '?'.

The solution was to use:

mysql_set_charset('utf8',$conn);

It was required even though we used:

SET NAMES utf8
SET CHARACTER SET utf8

Also - since we have used ADOdb then we needed to find the PHP connection handle. We used the following statement:

mysql_set_charset('utf8',$adoConn->_connectionID);
agsamek