views:

49

answers:

3

Hi,

I am doing the following.

1) I am exporting a database and saving it to a file called dump.sql. 2) The file is then transferred to a different server via PHP ftp. 3) When the file has been successfully transferred the administrator has an option to run a 'dbtransfer' script on the new host. 4) This script blows up the script and runs the queries line by line.

This works great - however there is a problem with foreign language encoding. We are using UTF-8.

Step 1 : This works fine, file is in UTF-8 Format. Step 3 : When I test the contents of the dump.sql file using mb_check_encoding(). The string comes back as UTF-8. Step 4 : This creates tables with utf8_general_ci encoding. The information is dumped in.

When I check the table after the transfer I get records like this: 'ç,Ç,ö,Ö,ü,Ü,ı,İ,ş,Ş,ğ,Ğ'. I don't understand how a UTF-8 string can lose its encoding when it goes into the database. Am I missing a step? Do I need to run some sort of function to ensure the string is parsed as UTF-8?

Once the system is installed I can save foreign language queries. It is just the transfer that is messing up.

Any ideas?

+1  A: 

Most likely, you did not tell MySQL that you were talking UTF-8 with it - the connection had the wrong character set. Use the mysql_set_charset function for that.

Also note that utf8_general_ci is not an encoding - it is a collation. In other words, it only tells MySQL how that column should be treated when comparing values (including when sorting).

Michael Madsen
+1  A: 

Its not clear how you are doing all these steps but lets give it a shot.

Firstly, make sure all database connection settings related to character sets are set to utf-8. There are some on the database side and there are some on the client side.

Secondly, before inserting any data do the following query:

SET NAMES 'utf8';
zaf
+1  A: 

After I connect to a DB in PHP I always execute the following query on the connection object to make sure the connection is using UTF-8:

$pdo->exec('SET NAMES \'utf8\' COLLATE \'utf8_unicode_ci\'');

Another option is that the target table is not in UTF-8.

Bas