views:

366

answers:

3

I have a mySQL database that has collation and character sets as follows:

mysql> show variables like "character_set_database";
+------------------------+-------+
| Variable_name          | Value |
+------------------------+-------+
| character_set_database | utf8  |
+------------------------+-------+
1 row in set (0.00 sec)

mysql> show variables like "collation_database";
+--------------------+-----------------+
| Variable_name      | Value           |
+--------------------+-----------------+
| collation_database | utf8_unicode_ci |
+--------------------+-----------------+
1 row in set (0.00 sec)

I have a table of countries, which consists of an, id and name. When I try to import this country (as an example with 'foreign' characters)

São Tomé and Príncipe

I get the following error:

  Unable to execute INSERT statement. [wrapped: SQLSTATE[HY000]: General error:
1366 Incorrect string value: '\xE3\xA3o To...' for column 'name' at row 1]

I have imported this data in the past - Does anyone have any idea why I cant import this data (as yml fixtures)?. I have succesfully imported it in the past. I dont rememeber what has changed since though

+2  A: 
\xE3\xA3o To...

Well it's right, that's not a valid byte sequence. ã in UTF-8 should be \xC3\xA3. Looks like somehow someone's loaded the import data as ISO-8859-1 and case-folded it to lowercase, transforming the C3 byte to E3. Naturally the results are no longer UTF-8.

Would need more detail about the import process to say more.

bobince
Bob: the data is being imported as a fixtures file (in YML format) by the Symfony (v1.31) task 'symfony propel:data-load. As I mentioned earlier, the file has been saved in UTF-8 format. I am developing on the windows platform. HTH.
Stick it to THE MAN
Bob: is there anyway to log the queries sent to mysql during the import? - (incase you are not familiar with the Symfony framework and data import task)
Stick it to THE MAN
Yeah, I'm not familiar with Symfony, but http://www.symfony-project.org/book/1_2/16-Application-Management-Tools reckons the `..._dev.log` file should be logging queries? I can't immediately see any doc for the data-load process that says anything about lowercasing, but *something* must be doing it. As another thought, is your connection itself using UTF-8? AIUI you should have `encoding: utf8` in your `databases.yml`.
bobince
A: 

Put <?php header('Content-Type: text/plain; charset=utf-8'); ?> at the top of your fixtures.yml file.

Adam V.
A: 

I wouldn't use fixtures.yml to upload complex data like this. I've got a very similar situation/setup as you and I use MySQL's LOAD DATA INFILE without any problems.

LOAD DATA INFILE 'C:/development/cities.csv' INTO TABLE project.city CHARACTER SET utf8
FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"'
LINES TERMINATED BY '\r';

... where the cities.csv is a UTF-8 export from OpenOffice Calc.

Tom