tags:

views:

49

answers:

3

For example if I search for the name Åsa i only want to get the name Åsa and not Asa, same with Björn instead of Bjorn

$query="select * from users where username like 'Björn'";
$result=mysql_query($query);
$num=mysql_num_rows($result);
echo"$num";
$i=0;
while($i<$num){     
$id=mysql_result($result,$i,"id");
$name=mysql_result($result,$i,"username");    
echo"<br/>$id,$name";
$i++;
}

Results

34,Björn
67,Bjorn

only result 34 is supposed to be displayed
I am using

mysql_query("SET NAMES utf8");
mysql_query( "SET CHARACTER SET utf8");

The database, table and column are set to utf8_unicode_ci

A: 

Don't use a LIKE query when what you mean is =.

$query="select * from users where username = 'Björn'";
VoteyDisciple
that didn't make any difference, still 2 results
Lawrence
+3  A: 

Your "problem" is the utf8_unicode_ci collation. That collation does "character expansions", meaning that Umlauts and their base characters are treated as the same even in a = comparison:

A = Ä
O = Ö
...

The second example on this mySQL manual page explains the issue: 9.1.7.8. Examples of the Effect of Collation

What you would have to do is either switch to a collation that distinguishes between umlaut and base character (e.g. utf8_general_ci or utf8_general_bin) or switch to a different collation only when doing the comparison:

select * from users where username like 'Björn' COLLATE utf8_general_ci;

this is obviously slower, because the collation conversion has to be done for each record during the query.

Pekka
thanks, it worked
Lawrence
utf8_bin is the only option that works for me, utf8_general_ci didn't work, i will change entire DB to utf8_bin
Lawrence
A: 

The database, table and column are set to utf8_unicode_ci

This collation deliberately makes many accented characters match their unaccented bases. Often that's what you want, sometimes it isn't. When it isn't, you have to pick a collation that best fits the rules of the language you're working with. Here's some tables of MySQL collations you can search to try to find something suitable.

utf8_swedish_ci is a commonly-used possibility that doesn't collate A/Å/Ä and O/Ö together, though other accented characters are still collated together. Unfortunately there isn't a good collation I can see that knows about a wide range of European accented letters but still treats them as different. If you don't need case-insensitive matching at all, you can of course use utf8_bin.

bobince
i'll use utf8_bin, store all usernames in lower characters in db. convert user input query search string to lower before searching.
Lawrence