views:

311

answers:

2

I am trying to achieve accent and case-insensitive sorting in MySQL. Following the instructions in the manual, this is supposed to work with the utf8 character set and utf8_general_ci collation.

When I follow the example in the manual (http://dev.mysql.com/doc/refman/5.1/en/charset-collation-implementations.html) under "Collations for Unicode multi-byte character sets" I do not get the same results:

Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 679877
Server version: 5.1.41-log MySQL Community Server (GPL) by Remi

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> SET NAMES 'utf8' COLLATE 'utf8_general_ci';
Query OK, 0 rows affected (0.00 sec)

mysql> SELECT 'a' = 'A', 'a' = 'À', 'a' = 'á';
+-----------+-----------+-----------+
| 'a' = 'A' | 'a' = 'À' | 'a' = 'á' |
+-----------+-----------+-----------+
|         1 |         0 |         0 |
+-----------+-----------+-----------+
1 row in set (0.00 sec)

mysql> 

In the example in the manual, those are all 1.

It also fails to treat accented characters equally when I try to set the collation directly in a query. In this example, the table is using latin1 and I'm converting to utf8.

mysql> select * from test;
+----------+
| k        |
+----------+
| Cárdenas |
| Cardozo  |
| Corbin   |
| Cabrero  |
+----------+

mysql> select k from test order by convert(k using utf8) collate utf8_general_ci
;
+----------+
| k        |
+----------+
| Cabrero  |
| Cardozo  |
| Corbin   |
| Cárdenas |
+----------+
4 rows in set (0.00 sec)

It should be ignoring the accent over the 'a' in the last entry and sorting it second. Any ideas what I'm doing wrong?

A: 

I might be missing something here...but can't you just make a function (say removeAccents) that takes a string and returns the non-accent equivalent string, and then sort by removeAccents(field). I believe you can create an index for that as well, which should help with the performance.

aip.cd.aish
+1  A: 

It works on my default MySQL installation. Since you haven't provided a SHOW FULL COLUMNS from test (@fsb comment) it's still possible that the collation in your table structure is incorrect.

  • Is the the collation of column 'k' set to something other than utf8_general_ci?

  • Check whether SELECT k from Names ORDER BY k ASC is giving the right answer

  • Has the MySQL installation Index.xml file been modified to change the meaning of utf8_general_ci?

Relevant part of my installation for comparison:

<charset name="utf8">
  <family>Unicode</family> 
  <description>UTF-8 Unicode</description> 
  <alias>utf-8</alias> 
  <collation name="utf8_general_ci" id="33">
    <flag>primary</flag> 
    <flag>compiled</flag> 
  </collation>
  <collation name="utf8_bin" id="83">
    <flag>binary</flag> 
    <flag>compiled</flag> 
  </collation>
 </charset>
  • Has the compiled code been modified to change the meaning of utf8_general_ci?

Conceivably someone else has tweaked one of these things for some nefarious purpose...

MZB