tags:

views:

43

answers:

2

Say if I have a query that look like this:

SELECT * FROM table WHERE category='5' and status='1' LIMIT 5

The table has 1 million rows.

To speed things up, I create index (status, category), i.e. multiple column index.

There are 600 categories but only 2 status (1 or 0). I'm wondering if there is any difference in performance if I create index (category, status) instead of index (status, category).

+1  A: 

There shouldn't be any difference. The selectivity of the index is identical whether you order it (category, status) or (status, category).

By the way, using LIMIT is often meaningless without also using ORDER BY. The order of rows returned by an SQL query is arbitrary unless you specify an order.


Re your comment: Yes, it's common to need five random rows, but arbitrary is not the same as random. It's not common to need five arbitrary rows.

Bill Karwin
Meaningless? But what if I only want 5 rows, regardless of what the order is. Does it make sense to have this query? (Of course, it will be better if I add ORDER BY, but I'm afraid that will slow down the query further.)
Eric Sim
+2  A: 

Status first. The trick is then if you only need to query by category you can.

SELECT * from table where status in (1,0) and category = 'whatever'

and still get index support. Of course if your queries all use both columns it's the same either way, but in this case if you use only status it's much better, and only category only slightly worse if at all.

If you are looking at a lot of inserts as well, you want to minimize the number of indices, so this is your best bet rather than having multiple indices.

anq
+1 at minimizing indexes if inserts are common. Many people take creating indexes too lightly in spite of their cost.
Chimmy