views:

40

answers:

2

I've noticed that in Drupal after some users enter content into the body of a node, characters like quotes and apostrophes get saved as: ’ “

This is due the user entering odd characters or something...

  1. Does this mean the Drupal database hasn't been configured for utf8?
  2. How can this be corrected so ’ is actually saved as just '?

UPDATE Turns out, in my case, the decoded characters were a results of my downloading the Drupal mysql database dump which gets saved as ascii and loading that into Mysql to repopulate the database. This happened twice, once when I migrated hosting and the second time when I messed up the DB and had to restore is manually.

The solution is you are using the Drupal Backup and Migrate module and you download the backup file and have to restore manually is to convert the file in notepad or whatever from ascii into utf8. I tested this and it works.

A: 
  1. Does this mean the Drupal database hasn't been configured for utf8?

I'd say it's the most likely possibility that the database table(s) are set to latin1. Take a look. Alternatively, it could also be that the database connection is not UTF-8 encoded (sending a SET NAMES utf8; query sometimes helps.)

Pekka
I checked the database and he table doesn't appear to be set to anything, which I assume it means it defaults to Latin1... If that is the case, what is the "proper" way to set the table to UTF-8?
JonnyJon
@Jonny the table always has an encoding, better get the specific info (although they *do* default to `latin1_swedish_ci`, mySQL being a swedish product). Can you use a client like phpMyAdmin or HeidiSQL to check it out?
Pekka
The database is set to:character set uf8collation utf8_general_ciThe table node_revisions doesn't appear to have any specific collation or charset.
JonnyJon
@Jonny strange. Is the HTML form you enter the data in encoded UTF-8?
Pekka
A: 

’ “ is the UTF-8 encoding of ’ “ misinterpreted as windows-1252 (not latin-1).

Does this mean the Drupal database hasn't been configured for utf8?

That's one possibility. Others are:

  • The program that puts data in the database is broken.
  • The program that retrieves data from the database is broken.
dan04