ansaurus

Question

Changing character encoding in MySQL, PHP scripts, HTML

Answer 1

+1 A:

It's tricky.

You have to:

change the DB and every table character set/encoding – I don't know much about MySQL, but see here
set the client encoding to UTF-8 in PHP (SET NAMES UTF8) before the first query
change the meta tag and possible the Content-type header (note the Content-type header has precedence)
convert all the PHP files to UTF-8 w/out BOM – you can easily do that with a loop and iconv.
the trickiest of all: you have to change most of your string function calls. Than means mb_strlen instead of strlen, mb_substr instead of substr and $str[index], etc.

Artefacto 2010-06-07 10:00:13

DB - check, client encoding - you mean when interfacing with the MySQL server through PHP? meta tag - check, PHP files - check, PHP functions... Uh, ok. While I don't use strlen and substr all that much - what about that $str[index]? Do you mean that while writing in a UTF8-encoded PHP file, I can't write <? print $foo["Översrift"] ?> Presumably, the string is sent to the PHP interpreter as UTF8 data and the saved indexed data should be identical, no?

Sandman 2010-06-07 11:06:28

As long as there is no data coming from elsewhere indeed the $foo["Översrift"] would keep on working provided all files are converted to utf-8.

Wrikken 2010-06-07 16:07:32

@Sandman yes I mean when interfacing with the MySQL server through PHP. What I mean by `$str[index]` is stuff like `$str[0]` (index is an integer). For instance, you cannot use `$str[0]` to get the first character because UTF-8 is a multi-byte encoding; if the first character takes more than 1 byte (which is the case for all non-ASCII characters), `$str[0]` will get only the first byte of the character. There any many other cases – the majority of functions that operate on strings will have to be modified.

Artefacto 2010-06-07 23:31:16

Right, then I'm with you. I'd never use $str[index] that way :)

Sandman 2010-06-08 13:52:32

Answer 2

A:

Don't convert to UTF8 if you don't have to. Its not worth the trouble.
UTF8 is (becoming) the new standard, so for new projects I can recommend it.

Functions
Certain function calls don't work anymore. For latin1 it's:

 echo htmlentities($string);

For UTF8 it's:

 echo htmlentities($string, ENT_COMPAT, 'UTF-8');

strlen(), substr(), etc. Aren't aware of the multibyte characters.

MySQL
mysql_set_charset('UTF8') or mysql_query('SET NAMES UTF8') will convert all text to UTF8 coming from the database(SELECTs). It will also convert incoming strings(INSERT, UPDATE) from UTF8 to the encoding of the table.

So for reading from a latin1 table it's not necessary to convert the table encoding.
But certain characters are only available in unicode (like the snowman ☃, iPhone emoticons, etc) and can't be converted to latin1. (The data will be truncated)

Scripts
I try to prevent specials-characters in my php-scripts / templates.
I use the ë notation instead of ë etc. This way it doesn't matter if is saved in latin1 or utf8.

Bob Fanger 2010-06-07 14:18:39

MySQL tables would not have to be converted as long as what you're saving is available in their current character set. However, if it's not (and that's no small possibility when going latin1 => utf8), they should be converted (ALTER TABLE foo SET CHARACTER SET utf8), possibly columns by themselves if they have been separately set.

Wrikken 2010-06-07 14:33:33

No, if you change the encoding for the connection the mysql server/client will convert it on-the-fly.

Bob Fanger 2010-06-07 14:38:49

I use it if I need to generate a ms-excel csv-file. Tables are in UTF8 and after a `SET NAMES lantin1` i can write to the csv-file without a single utf_decode()

Bob Fanger 2010-06-07 14:42:01

@Bob Fanger: think about writes to table, not reads. Yes, conversion is attempted, but putting utf-8 in latin1 is simply not always possible, or am I mistaken? If the character sets overlap 100%, why use the one over the other?

Wrikken 2010-06-07 16:24:01

@Wrikken You're not mistaken. Obviously you cannot put in a latin1 column characters that are not in latin1 like ى.

Artefacto 2010-06-07 23:32:27

@Artefacto @Wrikken Valid point, I updated the anwser.

Bob Fanger 2010-06-08 09:41:36

ansaurus

tags:

views:

answers:

Changing character encoding in MySQL, PHP scripts, HTML

related questions