tags:

views:

22

answers:

1

Does anyone know of a quick and easy way to locate special characters that didn't get correctly converted when data was imported into MySQL.

I think this an issue due to data encoding (e.g. Latin-1 vs. UTF-8). Regardless where the issue first occurred, I'm stuck with junk in my data that I need to remove.

A: 

There's unlikely to be an easy function for this, because for example, a broken UTF-8 special character will consist of two valid ISO-8859-1 characters. So while there are patterns of what those broken characters look like, there is no sure-fire way of identifying them.

You could build a search+replace function to replace the most common occurrences in your language (e.g. Ü for Ü if imported from UTF-8 into ISO-8859-1).

That said, it would be best to restart the import with the correct settings, if at all possible.

Pekka
Unfortunately reimporting the data at this point isn't really an option. And there aren't many of these special characters sprinkled throughout the data. But even writing a search-n-replace script, you need to have a starting list of special characters to replace. That's the list I'm trying to create.
gurun8