views:

229

answers:

3

These two querys gives me the exact same result:

select * from topics where name='Harligt';
select * from topics where name='Härligt';

How is this possible? Seems like mysql translates åäö to aao when it searches. Is there some way to turn this off?

I use utf-8 encoding everywhere as far as i know. The same problem occurs both from terminal and from php.

+1  A: 

you want to check your collation settings, collation is the property that sets which characters are identical.

these 2 pages should help you

http://dev.mysql.com/doc/refman/5.1/en/charset-general.html

http://dev.mysql.com/doc/refman/5.1/en/charset-mysql.html

oedo
+15  A: 

Yes, this is standard behaviour in the non-language-specific unicode collations.

9.1.13.1. Unicode Character Sets

To further illustrate, the following equalities hold in both utf8_general_ci and utf8_unicode_ci (for the effect this has in comparisons or when doing searches, see Section 9.1.7.7, “Examples of the Effect of Collation”):

Ä = A Ö = O Ü = U

See also Examples of the effect of collation

A workaround would be using a different collation for the comparison, I'm looking up some links right now.

Update:

I can't test myself right now but applying the utf8_bin collation to your SELECT should block the implicit Umlaut conversion for that query only:

select * from topics where name='Harligt' COLLATE utf8_bin;

it becomes more difficult if you want to do a case-insensitive LIKE but not have the umlaut conversion. I know no mySQL collation that is case insensitive and does not do this kind of implicit umlaut conversion. If anybody knows one, I'd be interested to hear about it.

Related:

Pekka
+3  A: 

Since you are in Sweden I'd recommend using the Swedish collation. Here's an example showing the difference it makes:

CREATE TABLE topics (name varchar(100) not null) CHARACTER SET utf8;

INSERT topics (name) VALUES ('Härligt');

select * from topics where name='Harligt';
'Härligt'

select * from topics where name='Härligt';
'Härligt'    

ALTER TABLE topics MODIFY name VARCHAR(100) CHARACTER SET utf8 COLLATE utf8_swedish_ci;

select * from topics where name='Harligt';
<no results>

select * from topics where name='Härligt';
'Härligt'

Note that in this example I only changed the one column to Swedish collation, but you should probably do it for your entire database, all tables, all varchar columns.

Mark Byers
This should be working for other languages than swedish so i think utf8_bin is the collation i want, but thanks for letting me know that i can change collation on only one column, that will be really helpful.
Martin
Note that utf8_bin is case sensitive so "härligt" != "Härligt" (applies to unique indexes too).
Serbaut