views:

380

answers:

3

Here is a screen shot of weird characters in my database. I know that this character combination is for a crazy apostrophe. Should I just let these characters stay in my database? Or should I strip them out and replace with normal apostrophes?

If I should strip, is there on ruby function to ensure that all characters that save to my database are normal (whatever normal means)?

Thank you!

+4  A: 

Leave them there, they're UTF-8 encoded.

Just make sure that when you display them that the output terminal (or web page) is also set to recognise UTF-8 encoding - then they'll be automatically displayed as the right character.

If you'd really prefer an ASCII single quote, by all means strip them, but generally it's better to let the system handle the UTF-8 data so that you can also handle accented characters, currency symbols, Klingon, etc...

Alnitak
ok, they were displayed properly on my site. i was a bit worried because what if i need to run some regex's on them. do i have to use the hex codes?
Tony
A: 

Just a wild guess, but this really looks encoding mismatch between Latin-N (ISO-8859-N, usually N=1) and UTF-8. UTF-8 uses 2-byte representation for Unicode values between 128-255, whereas Latin-N uses single byte for those. So: it looks like String is stored using UTF-8, but read claiming it was Latin-1, resulting in incorrect characters being decoded from stored bytes.

StaxMan
+1  A: 

I somewhat agree with Alnitak. Just wanted to point out that keeping them might affect the order in which the records are returned when you select them ordered by that field. This is most obvious when you deal with accented characters, an accentuated "o" might save like "A~" so it will appear before an "a". Also be careful if you perform any other operation in the database such as: SELECT * FROM TABLE WHERE LENGHT(FIELD) > 5 The LENGHT function will count 2 chars for each muti-byte char.

HC