views:

727

answers:

2

I've got a database with a bunch of broken utf8 characters scattered across several tables. The list of characters isn't very extensive AFAIK (áéíúóÁÉÍÓÚÑñ)

Fixing a given table is very straightforward

update orderItem set itemName=replace(itemName,'á','á');

But I can't get a way of detecting the broken characters. If I do something like

SELECT * FROM TABLE WHERE field LIKE "%Ã%";

I get nearly all the fields because of the collation (Ã=a). All broken characters so far start with an "Ã". The database is in spanish so this particular character isn't used

The list of broken chars I've got so far is

á = á
é = é
í- = í
ó = ó
ñ = ñ
á = Á

Any idea of how to make this SELECT to work as intended? (a binary search or something like that)

+1  A: 

How about a different approach, namely converting the column back and forth to get the correct character set? You can convert it to binary, then to utf-8 and then to iso-8859-1 or whatever else you're using. See the manual for the details.

wds
the idea is to end up whit a utf-8 encoded db. right now the encoding a collation is utf-8 general. but apparently the application that use the db was interpreting as ISO8859.If I convert it back and forth, I will end with the same data...
The Disintegrator
well, only converting back and forth doesn't do magic, the characters stills broken. BUT in binary I can make a select looking for the à character. So now I have a mechanism to detect the broken chars. Thanks.
The Disintegrator
Okay, I remain convinced that there must be a way to use the conversion mechanism in a more general way but it might be more complex than first stated. Happy you found a solution that worked.
wds
+1  A: 
update `table_name` set `column_name` = replace(`column_name` ,'á','á');
update `table_name` set `column_name` = replace(`column_name` ,'é','é');
update `table_name` set `column_name` = replace(`column_name` ,'í©','é');
update `table_name` set `column_name` = replace(`column_name` ,'ó','ó');
update `table_name` set `column_name` = replace(`column_name` ,'íº','ú');
update `table_name` set `column_name` = replace(`column_name` ,'ú','ú');
update `table_name` set `column_name` = replace(`column_name` ,'ñ','ñ');
update `table_name` set `column_name` = replace(`column_name` ,'í‘','Ñ');
update `table_name` set `column_name` = replace(`column_name` ,'Ã','í');
update `table_name` set `column_name` = replace(`column_name` ,'–','–');
update `table_name` set `column_name` = replace(`column_name`,'’','\'');
update `table_name` set `column_name` = replace(`column_name`,'…','...');
update `table_name` set `column_name` = replace(`column_name`,'–','-');
update `table_name` set `column_name` = replace(`column_name`,'“','"');
update `table_name` set `column_name` = replace(`column_name`,'â€','"');
update `table_name` set `column_name` = replace(`column_name`,'‘','\'');
update `table_name` set `column_name` = replace(`column_name`,'•','-');
update `table_name` set `column_name` = replace(`column_name`,'‡','c');
update `table_name` set `column_name` = replace(`column_name` ,'Â','');
Raúl Avila Solano