ansaurus

Question

Index performance with WHERE clause in SQL

Answer 1

+5 A:

Usually (there are other options) a database index is a B-Tree, which means that you can do range scans on it (including equality scans).

The condition app_user_id % 10 = 0 cannot be evaluated with a single range scan, which is why a database will probably not use an index.

It could still decide to use the index in another way, namely for a full scan: Reading the whole table takes more time than just reading the whole index. On the other hand, after reading the index you may still get back to the table, so the overall cost may end up being higher.

This is up to the database query optimizer to decide.

A few examples:

 select app_user_id from t where app_user_id % 10 = 0

Here, you do not need the table at all, all necessary data is in the index. The database will most likely do a full index scan.

 select count(*) from t where app_user_id % 10 = 0

Same. Full index scan.

 select count(*) from t

Only if app_user_id is NOT NULL can this be done with the index (because NULL data is not in the index, at least on Oracle, at least on single column indexes, your database may handle this differently).

Some databases do not need to do access table or index for this, they maintain row counts in the metadata.

 select * from t where app_user_id = 5

This is the classic scenario for an index. The database can look at the small section of the index tree, retrieve a small (just one if this was a unique or primary index) number of rowids and fetch those selectively from the table.

 select * from t where app_user_id between 5 and 10

Another classic index case. Range scan in the tree returns a small number of rowids to fetch from the table.

 select * from t where app_user_id between 5 and 10 order by app_user_id

Since index scans return ordered data, you even get the sorting for free.

 select * from t where app_user_id between 5 and 1000000000

Maybe here you should not be using an index. It seems to match too many records. This is a case where having bind variables hide the range from the database could actually be detrimental.

 select * from t where app_user_id between 5 and 1000000000 
    order by app_user_id

But here, since sorting would be very expensive (even taking up temporary swap disk space), maybe iterating in index order is good. Maybe.

 select * from t where app_user_id % 10 = 0

This is difficult to decide. We need all columns, so ultimately the query needs to touch the table. The question is whether to go through an index first. The query returns approximately 10% of the whole table. That is probably too much for an index access path to be efficient. If the optimizer has reason to believe that the query returns much less than 10% of the table, an index scan followed by accessing the table might be good. Same if the table is very fragmented (lots of deleted rows eating up space).

Thilo 2010-10-20 04:42:45

So basically your answer says it depends on the database.

controlfreak123 2010-10-20 04:54:46

and the data. and configuration options. Also, it does not matter if an index gets used or not. It only matters whether that would be faster than other alternatives. In any case, your reasoning should be "what queries do I need to run" and then design schema and indexes accordingly. If your % 10 query does not perform well enough, you may need to denormalize the column, or build a functional index.

Thilo 2010-10-20 04:58:26

ansaurus

tags:

views:

answers:

Index performance with WHERE clause in SQL

related questions