views:

362

answers:

3

Hi,

I`ve read that columns that are chosen for indices should discrimate well among the rows, i.e. index columns should not contain a large number of rows with the same value. This would suggest that booleans or an enum such as gender would be a bad choice for an index.

But say I want to find users by gender and in my particular database, only 2% of the users are female, then in that case it seems like the gender column would be a useful index when getting the female users, but not when getting all the male users. So would it generally be a good idea to put an index on such a column?

Thanks, Don

A: 

This is a case where I would let the server statistics inform me of when to create the index. Unless you know that this query is going to predominate or that running such a query would not meet your performance goals a priori, then creating the index prematurely may just cost you performance rather than increase it. Also, you may want to think about how you would actually use the query. In this case, my guess would be that you'd typically be doing some sort of aggregation based on this column rather than simply selecting the users who meet the criteria. In that event, you'll be doing the table scan anyway and the index won't buy you anything.

tvanfosson
+1  A: 

Indexing a low-cardinality column to improve search performance is common in my world. Oracle supports a "bitmapped index" which is designed for these situations. See this article for a short overview.

Most of my experience is with Oracle, but I assume that other RDBMS' support something similar.

JPLemme
+1  A: 

Don't forget, though, that you'll probably only be selecting for females about 2% of the time. The rest of the time, you'll be searching for males. And for that, a straight table scan (rather than an index scan plus accessing the data from the table) is going to be quicker.

You can also, sometimes, use a compound index, with a low cardinality column (enum, boolean) coupled with a higher cardinality column (birth date, perhaps). This depends very much on the full data, and the queries you'll really use.

My experience is that an index on male/female is seldom going to be truly useful. And the general advice is valid. One more point to remember - indexes have to be maintained when you add or remove (or update) rows. The more indexes, the more work each modify operation has to do, slowing the system down.

There are whole books on index design.

Jonathan Leffler