views:

100

answers:

3

I have a MySQL table properly set to the UTF-8 character set. I suspect some data inserted into one of my columns has been double encoded. I am expecting to see a non-breaking space character (UTF-8 0xC2A0), but what I get when selecting this column out of this table is four octets (0xC3A2 0xC2A0). That's what I would expect to see if at some point somebody had treated an UTF-8 0xC2A0 as ISO-8859-1 then attempted to encode again to UTF-8 before inserting into MySQL.

My test above where I am seeing the four octets involves selecting this column out of MySQL with Perl's DBD::mysql. I'd like to take Perl and DBD::mysql out of the equation to verify that those four octets are actually what MySQL has stored. Is there a way to do this directly with a SQL query?

+1  A: 

Why not try the BINARY operator?

"The BINARY operator casts the string following it to a binary string. This is an easy way to force a column comparison to be done byte by byte rather than character by character."

http://dev.mysql.com/doc/refman/5.0/en/cast-functions.html

Hope this helps!

simeonwillbanks
+4  A: 
mysql> SELECT HEX(name) FROM mytable;
+-----------+
| hex(name) |
+-----------+
| 4142C2A0  | 
+-----------+
bobince
+2  A: 

You could try using the HEX() function [http://dev.mysql.com/doc/refman/5.0/en/string-functions.html#function%5Fhex%5D. From the documentation, SELECT HEX('abc'); returns 616263.

Matt McClellan