views:

26

answers:

1

Migrating Data from MySQL server1 to MySQL server2

server1 Ver 14.12 Distrib 5.0.51a, for debian-linux-gnu (x86_64) using readline 5.2

mysql> SHOW VARIABLES LIKE 'character_set%';
+--------------------------+------------------------------------------+
| Variable_name            | Value                                    |
+--------------------------+------------------------------------------+
| character_set_client     | utf8                                     |
| character_set_connection | utf8                                     |
| character_set_database   | utf8                                     |
| character_set_filesystem | binary                                   |
| character_set_results    | utf8                                     |
| character_set_server     | latin1                                   |
| character_set_system     | utf8                                     |
| character_sets_dir       | /data/mysql/gabino/share/mysql/charsets/ |
+--------------------------+------------------------------------------+
8 rows in set

server2 Ver 14.12 Distrib 5.0.90, for pc-linux-gnu (x86_64) using readline 6.0

mysql> SHOW VARIABLES LIKE 'character_set%';
+--------------------------+----------------------------+
| Variable_name            | Value                      |
+--------------------------+----------------------------+
| character_set_client     | utf8                       |
| character_set_connection | utf8                       |
| character_set_database   | utf8                       |
| character_set_filesystem | binary                     |
| character_set_results    | utf8                       |
| character_set_server     | utf8                       |
| character_set_system     | utf8                       |
| character_sets_dir       | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
8 rows in set

Server1 MySQL is the backend of a Wordpress blog, everything works fine from the frontend, until I (the unlucky guy) has to migrate data so I logged into PhpMyAdmin and MySQL console. Now from the backend it seems that every east-Asian character in server1 is messed up, either in SELECT queries in console or mysqldump files. The symptom is, for example the Chinese character turned into three latin1 characters 看, which is the same result SELECT _latin1'看'. The UTF8 presentation of is \xe7\x9c\x8b so MySQL somehow directly displayed each byte as individual latin1 character instead of rendering 3 bytes as a Chinese character.

Even if I use the 'Data Transfer' function in Navicat 8 to copy two database from server1 to server2 identically, the new blog running on server2 get messed up characters. I tried various methods like SET NAMES utf8 etc. and still can not get it done.

So how can I tell/force server1 MySQL to handle the latin1 characters as utf8 and get them displayed and dumped correctly?

+1  A: 

Do a hex dump (ie: SELECT HEX(columnname) FROM table) on both servers and see if the data is the same. If it is, then you'll know that at least the data didn't get corrupted.

In this case, you just need to set the correct charset and collation for the server(s). If not, you'll probably have to re-do the data transfer, and this time around make sure the settings are correct.

Another thing is make sure the browser's encoding is set to utf-8.

EDIT: So, data did get corrupted in the transfer. C3A7C593E280B9 is the UTF-8 representation of 看. This is probably because server1 is sending data as latin1, and server2 encodes that into UTF-8.

You have to change the connection settings on server1 before transferring data. To do that, run these queries:

SET CHARACTER SET utf8; SET NAMES utf8

Then try the data transfer again.

EDIT 2: Based on what you said, here's what I think is happening. The data sitting on your database is encoded in UTF-8. When PHP (Wordpress) fetches this data, it "thinks" it's encoded in latin1 (ISO-8859-1), which is (unfortunately) what PHP uses by default. PHP goes on to serve this data to the user's browser as if it was encoded in latin1, but sets the character encoding as UTF-8, and the user sees what he's supposed to see.

In short, it's a case of two wrongs making a right. You now have two options:

  1. Fix the data. (ie: read it as UTF-8 and write it back as latin1)

  2. Set server2's connection settings to the same as server1, which will result in data still being displayed correctly.

quantumSoup
Wow, thanks man, looks like it's double encoded. `看` turned into `C3A7C593E280B9`. WTF is going on?
est
@est refer to edit
quantumSoup
@Aircule Thank you, but I already tried that, after in server1 console `SET CHARACTER SET utf8; SET NAMES utf8` then `SELECT HEX(column) FROM table` result is still `C3A7C593E280B9`
est
Is the hex dump from server1 or server2?
quantumSoup
it's on server1. Nothing is done on server2 yet. BTW `mysqldump --default-character-set=latin1 --set-charset database` can get a correct utf8 .sql file without BOM.
est
@est and the browser still displays the blog correctly? What is the encoding the browser uses when it loads the blog pages?
quantumSoup
@Aircule blog on server1 is OK (as always). Browser encoding says utf-8
est
@est edited again.
quantumSoup
@Aircule Thanks a lot. I guess I have to config server2 the same wrong way as server1. LOL
est