views:

111

answers:

1

I have a general question about the way that database indexing works, particularly in mysql. Let's say I have a table with a million rows with a column "ClientID" that is distributed relatively equally among 30 values. Thus, this column is very low cardinality (30) relative to the primary key (1 million).

Now, I understand that you shouldn't create indexes on low cardinality fields. However, in this case, queries are only ever done with one of the 30 clientIDs. Thus, wouldn't creating an index on ClientID be helpful, as the search space is automatically reduced to 1/30th what it normally would be? Or is my understanding of how the index works flawed?

Thanks

+2  A: 

What I've learned as a very general rule of thumb from the MySQL-sponsored seminars I've attended is that you gain SOME level of performance benefit in most cases where the cardinality reduces your search space by 80% or more. This seems to hold across just about any hardware setup I've encountered.

This looks like an index you would almost certainly benefit from. Of course, this can quickly break down on a heavy-write table where the index constantly needs to be adjusted.

In other circumstances, the proper answer is really 'it depends', and benchmarking is usually the way to go.

You may also want to check the BENCHMARK() function in mysql - it's not perfect but can help in testing things like this: http://dev.mysql.com/doc/refman/5.1/en/information-functions.html#function_benchmark

AvatarKava
So the OP should index, since 1/30 is a reduction by (very) roughly 95% ?
lexu
Correct. An example of a field that COULD be bad to index would be a TINYINT(1) being used as a boolean flag. Naturally, your queries matter a lot here and your 'penalty' may range from wasted disk space to terrible lookup times - EXPLAIN is your friend here :)
AvatarKava
Merci for the explanation. Reading SO and learning is a joy!
lexu
Thanks! That's what I suspected, but I wasn't sure.
Kevin J