views:

463

answers:

6

I'm trying to import a large SQL file; created by mysqldump on one server, then converted with iconv; using these commands:

$ mysqldump -uxxx -p xxx > data.sql
$ iconv -f UTF-8 -t ISO-8859-1//TRANSLIT data.sql data.iconv.sql

I noticed that the "\x91" in the database had turned into "\xc3\x82\xc2\x91" in the dump, because mysqldump tries to convert everything to utf-8. I converted it back to "\x91" with iconv. If I do not convert them, they end up being "\xc2\x3f" on the new server; instead of just "\x3f" as we have now.

So, as read before, there are some characters unique to the cp1252 encoding, for example "\x91" and "\x92". Both these characters have turned to "\x3f" in the new database; this is the command I use to import:

$ /opt/mysql5/bin/mysql -uxxx -p -Dxxx < data.iconv.sql

Anybody knows how this could happen and what to do to prevent it? Knowing this I should be able to migrate data exactly as it was.

Extra information:

I used this source for the iconv-things. As you can see, on this page one other person had troubles with cp1252 as well; maybe that's the cause.

The serverversion are:

old-host: Ver 10.10 Distrib 5.0.18, for pc-linux-gnu (i486) new-host: Ver 10.11 Distrib 5.0.51, for pc-linux-gnu (i686)

A: 

If your data is cp1252 why are you telling iconv its utf-8?

James Anderson
Mentioned that in the originals question; If i do not, mysqldump generates \xc2\x91 instead of \x91. I believe I read on the same page as that solution that mysqldump tries to convert everything to utf-8. In this case, wrongly
gx
A: 

I still haven't found a working solution; we're going to try migrating using a script that makes 2 connections and simply selects from one connection and updates the other...

gx
A: 

Try:-

iconv -f IBM-1252 -t ISO-8859-1

Not sure why my previous reply was marked down. utf-8 is not ASCII! All the 7bit utf chars are identical to the 7 bit ASCI character set but the first bit b'1000000' has special meaning in UTF-8 and indicates that the character is a unicode two three or four byte character.

James Anderson
this is not a forum. if you have additional thoughts, edit your first answer
hop
A: 

Oops -- just getting the hang of this forum and didnt read your comment.

If mysql is converting to utf-8 then you want:

iconv -f utf-8 -t IBM-1252 xxxx

The should convert the open quote back to x"91". The "xc291" really is utf-8 for open quote.

James Anderson
A: 

What version of mysqldump are you using ?

Recent version 5 releases wrap table dumps with commands that set the character set, eg:

SET @saved_cs_client     = @@character_set_client;
SET character_set_client = utf8;

CREATE TABLE ...
...
...
SET character_set_client = @saved_cs_client;

Could the presence / absence of these be affecting your import ?

Martin
Answered your question in the main post!
gx
+1  A: 

On my system, mysqld defaults to saving data in latin1 with swedish collation. Similarly, the mysql command-line client defaults to providing data in latin1. mysqldump, on the other hand, defaults to utf-8.

This causes problems when exporting data via mysqldump and then importing using the mysql command-line client - characters that are not common to the two character sets are mutated.

The solution is to get mysqldump to decorate the data with additional commands that will correctly set the mysql clients character set:

mysqldump --set-charset ...

This will, by default, add "set name = utf-8" into the dumped data. This can now be imported cleanly by the mysql client.

Additional option "--default-character-set=xxx" can be used with mysqldump to convert the dump into something other than utf-8.

Using the -set-charset option should prevent you from having to use iconv at all.

Martin