I'm not 100% sure on the particulars, so I'd love someone straightening me out, but I'll forge ahead with what I think is going on...
When I first setup my database, I used the default character encoding of the system without even thinking, and it was latin1. I never even thought about i18n/l10n. It just didn't occur to me. I just accepted the defaults and went with it.
Anyways, I've been using the database exclusively for a Rails app, and we've now got several GB of data, 100,000s of rows, and many international users. I've noticed that many of our foreign users are inserting data that seems to be Unicode / non-latin1. Here is an example:
What about crazy Unicode stuff? ☢ ☠ ☭
database.yml
Here is our database.yml file.
development:
adapter: mysql
database: XXX
username: YYY
password: ZZZ
host: localhost
encoding: utf8
As you can see, we're setting our character encoding to utf8. However, all our tables have a default character set of latin1. I'm sure of this.
Update After looking closely, our production database.yml does not specify an encoding, while my local copy was specifying utf8. This was causing problems when I would dump the production database and import it locally. It seems now that the import was working fine, but Rails was reading it incorrectly.
mysql CLI tool
When I view the data via the mysql CLI tool, it displays all the Unicode characters correctly. However, the 'show create table' statement clearly shows that the tables are default charset latin1. This leads me to believe that MySQL is somehow smart enough to store non-latin1 data.
HTTP header
Our HTTP Content-Type header is set to utf-8, like so:
Content-Type: text/html; charset=utf-8
Conversion Attempts
I've played a little with converting our tables to utf-8 encoding, all with no success. Mainly I tried dumping the database, running iconv to convert, then re-importing with the tables set to utf-8. MySQL had no errors, but all the Unicode data was garbled.
What to do?
I'm kind of stuck as to what to do (if anything). I'm a strong believer in not fixing what isn't broken, but this whole situation worries me. We've never had any complaints from users about not being able to store their data, and everything seems to be working fine. I'd just like to know what exactly is going on, who/what is doing the conversion (MySQL? Ruby? Rails? MySQL connection?), and any tips on how to proceed.