views:

90

answers:

3

I used iconv to convert from latin1 to utf8 when I did an mysql dump of a database from mysql v4.0.21, and imported it onto a new server mysql v5.0.45

It was latin1 on the old server, it’s utf8 on the new server, so I ran this on the mysql dump: iconv −f latin1 −t UTF−8 quickwebcms_2010-03-01.sql

It ran successful, then I imported it onto the new server.

Now it displays question (?) marks (example: College?s) and  (example: College’s) when it prints out some of the data in my PHP application.

I exported the table these characters show up in and did a find and replace all within textmate, then imported it back into the new database and it uploads some of the fields as null, so the find and replace may of messed up something in the process. I saved the table csv as utf8 no bom, and just utf8 and it still does the same thing.

Any help as to why this might be happening is appreciated.

A: 

You may be better off loading the data onto the new server as latin1, then using the appropriate ALTER TABLE tablename CONVERT TO CHARACTER SET utf8 DEFAULT COLLATION utf8_unicode_ci on each table (or use a script of some sort to do it for you).

Or you could convert first, then dump.

R. Bemrose
I agree, let MySQL do the work.
Shannon Weyrick
+1  A: 

IIRC, mysqldump produces UTF-8 output by default, no matter what the database's encoding is. This user comment in the mySQL manual seems to confirm it:

I am just using default character sets - normally latin1. However, the dump produced by mysqldump is, perhaps surprisingly, in utf8. This seems fine, but leads to trouble with the --skip-opt option to mysqldump, which turns off --set-charset but leaves the dump in utf8.

Perhaps the fact that mysqldump uses utf8 by default, and the importance of the --set-charset option should be more prominently documented (see the documentation for the --default-character-set attribute for the current mention of the use of utf8)

Try skipping the iconv step, might work straight away.

Pekka
+1  A: 

If the content of your tables are all OK (and in UTF-8) and you sill have "bad" characters in your Web application, make sure your MySQL connection is using the UTF-8 charset in your PHP script. Even if your databases and tables are in UTF-8, MySQL uses latin1 connections by default (at least in my shared server config). So you have to tell MySQL to send content in UTF-8. Otherwise it will convert it on the fly to latin1 producing "bad" characters in UTF-8 webpages.

Use mysql_set_charset if available otherwise you can set it with a SQL query (always use mysql_set_charset if available):

if (function_exists('mysql_set_charset'))
    mysql_set_charset('utf8', $conn);
else
{
    if (mysql_query("SET character_set_results = 'utf8', character_set_client = 'utf8', character_set_connection = 'utf8', character_set_database = 'utf8', character_set_server = 'utf8'", $conn) === false)
    {
        //Error! Do something...
    }
}

Also make sure your (X)HTML markup uses UTF-8 too:

<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
AlexV