views:

332

answers:

2

In reading "High performance MySQL" from O'Reilly I've stumbled upon the following

Another common garbage query is SET NAMES UTF8, which is the wrong way to do things anyway (it does not change the client library's character set; it affects only the server).

I'm a bit confused, because I used to put "SET NAMES utf8" on the top of every script to let the db know that my queries are utf8 encoded.

Can anyone comment the above quote, or, to put it more formally, what are your suggestions / best practices to ensure that my database workflow is unicode-aware.

My target languages are php and python if this is relevant.

Many thanks.

+1  A: 

Not sure about py, but php has mysql_set_charset now, which states that this is the "preferred way to change the charset [and] using mysql_query() to execute SET NAMES is not recommended." Note, that this function was introduced for MySQL 5.0.7, so it won't work with earlier versions.

mysql_set_charset('utf8', $link);

Where $link is a connection created with mysql_connect

Typeoneerror
+3  A: 

mysql_set_charset() would be an option - but an option limited to the ext/mysql. Neither ext/mysqli nor PDO provides a similar method. As using this function results in a MySQL API call, it should be considered much faster than issuing a query.

In respect of performance the fastest way to ensure a UTF-8-based communiction between your script and the MySQL server is setting up the MySQL server correctly. As SET NAMES x is equivalent to

SET character_set_client = x;
SET character_set_results = x;
SET character_set_connection = x;

whereas SET character_set_connection = x internally also executes SET collation_connection = <<default_collation_of_character_set_x>> you can also set these server variables statically in your my.ini/cnf.

Please be aware of possible problems with other applications running on the same MySQL server instance and requiring some other character set.

Stefan Gehrig