tags:

views:

402

answers:

5

What extensions would you recommend and how should php be best configured to create a website that uses utf-8 encoding for everything. eg...

  • Page output is utf-8
  • forms submit data encoded in utf-8
  • internal processing of string data (eg when talking to a database) are all in utf-8 as well.

It seems that php does not really cope well with multibyte character sets at the moment. So far I have worked out that mbstring looks like an important extension.

Is it worth the hassle..?

+1  A: 

If mbstring isn't already part of your PHP package, then I definitely would recommend it to you - you'll even want to use it for calculationg string lengths ( mb_strlen($string_var, 'utf8') ) for form input... Else you won't need anything except valid and proper HTML, a correct http-server-config (so the server will deliver pages unsing utf-8) and a text editor with utf-8-support (e.g. Notepad++).

Augenfeind
+1  A: 

In your php.ini, set mbstring.internal_encoding = UTF-8 so that you don't need to pass an encoding parameter to the mb_ functions every time.

Ben James
A: 

Don't worry about it. For me, default installations of Apache2/PHP had worked perfectly, delivering full UTF8 support. The only thing I had to change is the collation of MySQL tables.

And, yes, use UTF source editor.

Andrejs Cainikovs
did your strlen and other functions continue to work properly?
Jorre
Yes, absolutely.
Andrejs Cainikovs
+2  A: 

php copes just fine!

You should set the php.ini "default_charset" parameter to 'utf-8'.

The make sure that:-

<head>
  <meta http-equiv="Content-Type"
    content="text/html; charset=utf-8"
    />

is at the top of every page you serve.

There are a few problem areas:

Databases -- make sure they are configured to use utf-8 by default or enter a world of pain.

IDEs/Editors -- a lot of editors dont support utf-8 well. I normally use vim which doesn't but its never been a big problem.

Documents -- just spent a whole afternoon getting php to read thai cahracters out of a spreadsheet. I was eventually succesful but am still not sure what I did right.

James Anderson
+2  A: 

The supposed issues of PHP with Unicode content have been somewhat overstated. I've been doing multilingual websites since 1998 and never knew there might be an issue until I've read about it somewhere - many years and websites later.

This works just fine for me:

Apache configuration (in httpd.conf or .htaccess)

AddDefaultCharset utf-8

PHP (in php.ini)

default_charset = "utf-8"
mbstring.internal_encoding=utf-8
mbstring.http_output=UTF-8
mbstring.encoding_translation=On
mbstring.func_overload=6

MySQL

CREATE your database with an utf8_* collation, let the tables inherit the database collation and start every connection with "SET NAMES utf8"

HTML (in HEAD element)

<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
djn
What does the "SET NAMES utf8" SQL statement actually do?
rikh
Straight from the MySQL docs: " A SET NAMES 'x' statement is equivalent to these three statements:SET character_set_client = x;SET character_set_results = x;SET character_set_connection = x;"This is handy because no matter which charset you use to store the data, the data still has to travel to and from PHP. One might never notice a problem while using a single computer (as in HTML FORM -> MySQL -> page), but using a devel machine to populate a db and moving it to the prod server to output it is risky, as the two may well have different client charsets. SET NAMES means portability.
djn