tags:

views:

42

answers:

1

In my database we have fields where the data is not readable. I now know why it happened but I don't know how to fix it.

I found a way to get the info back from the database:

SELECT id,
       name 
  FROM projects 
 WHERE LENGTH(name) != CHAR_LENGTH(name);

One of the rows returned shows:

id   | name
-------------------------
1008 | Cajón el Diablo

This should be:

id   | name
-------------------------
1008 | Cajón el Diablo

Can somebody help me figure out how to fix this problem? How can I convert this using SQL? Is SQL not good? If not, how about Python?

+1  A: 

Your mySQL data is most likely UTF-8 encoded.

The tool or client you are viewing the data with is either

  • Not talking to the mySQL server in UTF-8 (SET NAMES utf8)

  • Outputting UTF-8 characters in an environment that has an encoding different from UTF-8 (e.g. a web page encoded in ISO-8859-1).

You need to either specify the correct character set when connecting to the mySQL database, or convert the incoming characters so they can be output correctly.

For more information, you would have to tell us what collation your database and tables is in, and what you are using to look at the data.

If you want to get into the basics of this, this is very good reading: The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)

Pekka
Thank you Pekka. The issue I think is this: the application to the database didn't handle the data correctly and so, the data in the table is actually latin1 but the database thinks it is utf8.
Eric
@Eric aww, that sucks. I don't think that can be solved within the database (except of course by running a series of `set field = REPLACE(field, 'ó', 'ó')` operations.) Do you have access to a scripting language? In PHP, a `utf8_decode()` would do the trick I think.
Pekka
I think you are right. So, I guess I will try to find some creative way of getting this one. I can use Python. Thanks Pekka.
Eric
@Eric all right. Not directly related but maybe an inspiration: http://stackoverflow.com/questions/1177316/decoding-double-encoded-utf8-in-python
Pekka