views:

50

answers:

3

So, I've had some issues while trying to come over from Latin1 encoded databases, tables as well as columns, and now that everything is finally in UTF-8, I can't seem to update a row in a column. I am trying to replace an "e" with an e with acute (é). But it gives me this:

ERROR 1366 (HY000): Incorrect string value: '\x82m ...' for column 'Name' at row 1 when running this: UPDATE access SET Name='ém' WHERE id="2";

All databases gives me this when running the status command (except the 'current database' part ofcourse):


Connection id: 1
Current database: access
Current user: root@localhost
SSL: Not in use
Using delimiter: ;
Server version: 5.1.47-community MySQL Community Server (GPL)
Protocol version: 10
Connection: localhost via TCP/IP
Server characterset: utf8
Db characterset: utf8
Client characterset: utf8
Conn. characterset: utf8
TCP port: 3306
Uptime: 20 min 16 sec

Threads: 1 Questions: 110 Slow queries: 0 Opens: 18 Flush tables: 1 Open tables: 11 Queries per second avg: 0.90


And running the chcp command in cmd gives me '850'. Oh, and at some points I got this: ERROR 1300 (HY000): Invalid utf8 character string: 'ém' WHERE id="2"

I've looked everywhere for a solution, but I couldn't seem to find anything anywhere, and since I've always had good responses on Stackoverflow, I thought I'd ask here.

Thanks for any help!

A: 

When you input stuff on the command line, the strings will be in whatever character set the terminal uses. Why the mysql client doesn't translate that before sending it to the db still puzzles me, but it doesn't. You're probably sending latin1 to the db.

You could save your update SQL in a text file, make sure that text file is UTF-8, and run something like type myfile.txt | mysql db_name

nos
I watch this in Windows. `cmd.exe` makes some assumptions when it is piping information around. I think that `mysql --database=db_name --execute "SOURCE myfile.txt"` is probably safer. There is a mention of this somewhere in the documentation though I can't seem to find it at the moment.
D.Shawley
Oh, and the mysql client will transcode for you if the client character set and server character set differ. The code is buried in "client/sql_string.cc" if you are interested.
D.Shawley
A: 

This thread, although somewhat old, seems to result in the conclusion that cmd.exe and the mysql client don't handle UTF-8 encoding properly (with the blame being more aimed at cmd.exe).

Reading in SQL from a file is recommended, as is using an alternative client - or a flavour of UNIX. :)

Will A
Well, that answers the question. I ran it from a file, and it worked.Although, will the queries sent from the PHP code on my site be unicode? I have set the charsets in Apache and PHP to UTF-8 as well.Oh, and is there any other option that I can use instead of importing the code from a file?
Nisto
Glad to hear it, Nisto. Regarding other options - a decent MySQL front-end would be a good investment if you're wanting to run ad-hoc UTF-8 queries - you could probably knock something up in VS.Net instead if you've any programming experience. I'd hope that Apache / PHP will work fine - either as you have them configured or with some tweaks - if UTF-8 doesn't work with these then something's wrong in the world!
Will A
Yes! I never knew such a thing as these alternative 'MySQL front ends' existed, I just found 'HeidiSQL' and I'm loving it. Haha, editing lines just got much quicker too. Thanks for your answer!
Nisto
A: 

Well ... 0x82 is e-acute in code page 850. It would be 0xE9 in ISO-8859-1 which makes it something like 0xD0 0xB4 in UTF-8. I don't know if there is a good way to get a DOS window to handle UTF-8 input correctly. Here is an alternative if you are using the command line client. You can set the client character set to match whatever your local code page is and let the mysql library take care of the transcoding for you:

c:\> mysql --default-character-set=cp850
mysql> \s
--------------
mysql  Ver 14.14 Distrib 5.1.34, for apple-darwin9.6.0 (i386) using readline 5.2

Connection id:         17
Current database:
Current user:          daveshawley@localhost
SSL:                   Not in use
Current pager:         stdout
Using outfile:         ''
Using delimiter:       ;
Server version:        5.1.34-log Source distribution
Protocol version:      10
Connection:            localhost via TCP/IP
Server characterset:   ucs2
Db     characterset:   ucs2
Client characterset:   cp850
Conn.  characterset:   cp850
TCP port:              3306
Uptime:                19 days 8 hours 37 min 55 sec

Threads: 2  Questions: 248  Slow queries: 0  Opens: 71  Flush tables: 1  Open tables: 64  Queries per second avg: 0.0
--------------

I know that this works for the combination of latin1 in one window and utf8 in another window on my MacBook. I also verified that an ALTER TABLE ... CONVERT TO CHARACTER SET ucs2 did the right thing.

D.Shawley