views:

77

answers:

3

Hello All,

I read on MySQL Performance Blog that when tables are large, it is better to scan full tables, instead of using indexes.

I have a table with tens of millions of rows. When conducting queries, if I use no indexes, then queries are 24 times slower than with indexes. I know lot of things may cause this (e.g., are rows stored sequentially), but can you please give me some hints what might be happening? Or how I should start examining this issue? I want to understand when use of indexes is preferred and when it's not

Thanks

A: 

It is good practice to put the index on each column which you used in a WHERE clause.

Salil
+6  A: 

The article says that when dealing with very large data sets, where the amount of rows you need to work with are approaching the number of rows that is in the table, using an index might hurt performance.

In this case, going through the index will indeed hurt performance, as long as you need more data than is present in the index.

To go through the index, the database engine first has to read large parts of the index table (it is a type of table), then for each row (or set of rows) from this result, go to the real table and start cherrypicking pages to read.

If, on the other hand, you only need to retrieve columns that area already part of the index table, then the database engine only has to read from that, and not continue on to the full table for more data.

If you end up reading most or close to most of the actual table in question, all the work required to deal with the index might be more overhead than just doing a full table-scan to begin with.

Now, this is all the article is saying. For most work dealing with a database, using indexes is the exact right thing to do.

For instance, if you need to extract a small set of rows, going through an index instead of a full table scan will be many order of magnitudes faster.

In any case, if you're in doubt, you should do some performance profiling to find out how your application behaves under different types of loads, and then start tweaking, don't take a single article as a silver bullet for anything.

For instance, one way to speed up the example queries that does a count on the pad column in the article, would be to create a single index that covered both val and pad, in this way, the count would simply be a index-scan, and not a index-scan + table-lookup, and would run faster than the full table-scan.

Your best option is to know your data, and to experiment, and to know how the tools you use work, so indeed, learn more about indexes, but in the end, it is you who decides what is best for your program.

Lasse V. Karlsen
As the article says, "Even if you look at 1% or rows or less full table scan may be faster." So not to be lame, but the answer seems to be "it depends on the specific situation."
Yar
Yes, indeed it does.
Lasse V. Karlsen
+2  A: 

As always, it depends. I've so far never ran into a scenario as described in that blog posts. Using indexes on my queries for large (50+ million rows) has been on the order of 100 to 10000 times faster than doing a full table scan on these big tables.

There's probably no silver bullet here, you have to test for your particular data and your particular queries.

nos