ansaurus

Question

How does MySQL use collations with indexes?

Answer 1

+1 A:

MySQL will use the collation of the column for the index. So if you make a utf8_unicode_ci field, then the index will also be in utf8_unicode_ci order effectively.

Keep in mind that using the index will not always 100% bypass the performance impact, but for most practical purposes it will.

Many database systems aren't CPU bound, so I doubt you would notice the impact.

Harrison Fisk 2009-03-12 02:54:07

Thanks for the answer.

thomasrutter 2009-03-12 04:06:33

Answer 2

+1 A:

I believe that the btree structure will be different because it has to compare the column values differently.

Look at these two query plans:

mysql> explain select * from sometable where keycol = '3';
+----+-------------+-------+------+---------------+---------+---------+-------+------+--------------------------+
| id | select_type | table | type | possible_keys | key     | key_len | ref   | rows | Extra                    |
+----+-------------+-------+------+---------------+---------+---------+-------+------+--------------------------+
|  1 | SIMPLE      | pro   | ref  | PRIMARY       | PRIMARY | 66      | const |   34 | Using where; Using index | 
+----+-------------+-------+------+---------------+---------+---------+-------+------+--------------------------+


mysql> explain select * from sometable where binary keycol = '3';
+----+-------------+-------+-------+---------------+---------+---------+------+-------+--------------------------+
| id | select_type | table | type  | possible_keys | key     | key_len | ref  | rows  | Extra                    |
+----+-------------+-------+-------+---------------+---------+---------+------+-------+--------------------------+
|  1 | SIMPLE      | pro   | index | NULL          | PRIMARY | 132     | NULL | 14417 | Using where; Using index | 
+----+-------------+-------+-------+---------------+---------+---------+------+-------+--------------------------+

If we change the collation for the comparison, suddenly it isn't even able to seek the index anymore and has to scan every row. The actual values stored in the index will be the same regardless of collation, for instance, because it will still return the value in its original casing regardless of whether it's using a case sensitive or case insensitive collation.

So lookups against a case insensitive collation should be a little less efficient.

However, I doubt you'd ever be able to notice the difference; note that MySQL makes everything case insensitive by default, so the impact can't be that terrible.

UPDATE:

You can see a similar effect for order by operations:

mysql> explain select * from sometable order by keycol collate latin1_general_cs;
+----+-------------+-------+-------+---------------+---------+---------+------+-------+-----------------------------+
| id | select_type | table | type  | possible_keys | key     | key_len | ref  | rows  | Extra                       |
+----+-------------+-------+-------+---------------+---------+---------+------+-------+-----------------------------+
|  1 | SIMPLE      | pro   | index | NULL          | PRIMARY | 132     | NULL | 14417 | Using index; Using filesort | 
+----+-------------+-------+-------+---------------+---------+---------+------+-------+-----------------------------+

mysql> explain select * from sometable order by keycol ;
+----+-------------+-------+-------+---------------+---------+---------+------+-------+-------------+
| id | select_type | table | type  | possible_keys | key     | key_len | ref  | rows  | Extra       |
+----+-------------+-------+-------+---------------+---------+---------+------+-------+-------------+
|  1 | SIMPLE      | pro   | index | NULL          | PRIMARY | 132     | NULL | 14417 | Using index | 
+----+-------------+-------+-------+---------------+---------+---------+------+-------+-------------+

Note the extra 'filesort' stage required to execute the query. That means mysql is queuing up the result in a temporary buffer and sorting it itself using a quicksort in an extra stage, throwing out whatever the index order was. Using the original collation this step is uneccessary as mysql knows the order from index initially.

ʞɔıu 2009-03-12 03:03:17

Thanks - so if I understand correctly, the ordering of items in the b-tree _is_ affected by collation even though the actual values are preserved, so ORDER BY can still be efficient when using that collation. Let me know if I've misunderstood.

thomasrutter 2009-03-12 04:03:24

I think you've got it right.

ʞɔıu 2009-03-12 04:35:38

Ah I guess that 'using filesort' tells me what I needed to know. So was the actual collation of that column case insensitive? I guess at this point I ought to just test it out myself...

thomasrutter 2009-03-12 04:48:36

yes, the original was ci

ʞɔıu 2009-03-12 04:49:46

Yep. Tried with column collation of latin1_swedish_ci and "COLLATE latin1_general_cs" and yep that forces it into a filesort. Accepted.

thomasrutter 2009-03-12 04:54:04

ansaurus

tags:

views:

answers:

How does MySQL use collations with indexes?

related questions