tags:

views:

131

answers:

2

I have MySQL 5.xx running on Linux system. My application writes correctly ä, ö, å etc. characters to database and even gets these values correctly. But when I use WHERE to filter search for char 'ä', it will return also Strings that contain 'a' chars. Why MySQL thinks that a is equal to ä?

Example query:

SELECT column FROM table WHERE field='%ä%';

+6  A: 

MySQL's uses collations to compare character values.

Collations are the sets of rules used by database to define which characters are different and which are not when comparing.

Case sensitive collations distinguish between 'QUERY' and 'query', case insensitive do not.

Accent sensitive collations distinguish between 'résumé' and 'resume', accent insensitive do not.

In your column's default collation (most probably UTF8_GENERAL_CI), umlauted characters are indistinguishable from non-umlauted:

SELECT 'a' LIKE '%ä%'
---
  1

To distinguish between them, use binary collation (which treat all characters with different unicodes as different characters):

SELECT 'a' LIKE '%ä%' COLLATE UTF8_BIN
---
  0
Quassnoi
+2  A: 

Note: for many applications, collating a and ä as the same letter is considered a feature. My suggestion: be sure to double-check with your clients to determine which behavior is desired.

I might even follow up with a memo that says, "As we discussed on x date, the system will sort and find characters as follows..."

Larry K
Yes, this is true. This is done by google, so "ż", "ź" and "z" are all treated as "z", and this makes searching polish texts possible. The point is some languages like polish similar words rarely have totally different meaning, that can be altered by scratching national characters (usually context make possible to understand).
smok1