views:

85

answers:

2

I need to differentiate between nodes starting with 'O' and nodes starting with 'Ö' (o umlaut).

The problem is that since the node table and the title column have utf8_general_ci collation, MYSQL does not treat o and ö umlaut differently and this query returns nodes starting with O AND nodes starting with Ö

SELECT node.nid AS nid
FROM node AS node
WHERE node.status <> 0
AND SUBSTR( node.title, 1, 1 ) = 'O'

Since all Drupal modules use utf8_general_ci collation, I guess changing the collation of the table and entire database is probably not a good idea.

What is a good workaround for this?

+2  A: 

You can do a comparison using a different collation to the one the columns are using:

mysql> SELECT 'foo'='föo' COLLATE 'utf8_general_ci';
1
mysql> SELECT 'foo'='föo' COLLATE 'utf8_swedish_ci';
0

(o and ö being different letters in the Swedish collation. You could also use utf8_bin if you don't want any different characters to match, even different cases of the same letter.)

However doing a compare of columns where the collation is not the same as the collation of each column means indexes can't be used. So it may be more efficient simply to ALTER the tables to the more-specific collation you want.

bobince
Adding COLLATE to queries fell over with queries with joins so I ended up switching the database, tables and columns to utf8_swedish_ci using a script suggested by the Swedish Users Drupal Group. If anyone is interested it's here: http://www.phoca.cz/documents/38-tools/154-how-to-change-collation-in-database
kidrobot
While this a fine general MySQL solution, it requires circumventing Drupal's database API, as Drupal 6 currently doesn't support anything other than `utf8_general_ci`.
Mark Trapp
+2  A: 

Unfortunately, as you've noticed, Drupal 6 enforces utf8_general_ci collation.

However, this is a known issue, and people have been working on allowing the collation to be specified in settings.php: Database default collation is not respected

There's currently a patch in that issue (#90) for Drupal 6 in review that adds this ability. Once patched, all you need to do is add:

$db_collation = 'utf8_swedish_ci`;

to your settings.php file.

Mark Trapp
Thanks for the tip about the patch. Will give it a spin.
kidrobot