views:

68

answers:

2

In MySQL, does putting SELECT foo increase in performance if foo is indexed?

At RedditMirror.cc, I have a database with 1.2 million records in the GrabbedSites table, a number that increases by approx 500-2000 per day.

Early in my career, I was mentored that the only columns that should be indexed are those which you

  1. will do WHERE or JOIN SELECT/UPDATEs on in the future,
  2. need them to be UNIQUE data.

Because of that, GrabbedSites only has one key indexed (besides the primary key): categoryID, but 8 columns are queried.

The website receives dramatic bursts of flash traffic, sometimes over 100,000 unique visitors a day, and the DB becomes "taxed" at about 20% usage.

So I'm wondering, would there be a performance advantage in MySQL to adding indexes to all 8 frequently-queried columns??


Edit: Query is:

  SELECT url, 
         title, 
         published, 
         reddit_key, 
         UNIX_TIMESTAMP(last_fetched) last_fetched, 
         comment_link 
    FROM GrabbedSites 
   WHERE published BETWEEN DATE_SUB('2010-09-03', INTERVAL 1 DAY) 
                       AND '2010-09-03' 
ORDER BY published;

Only index is "published".

Explain says: Using where; Using filesort

A: 

If you are filtering on those columns you are planning to index you might get a performance increase. Since your database is mostly readonly (you only get 500-2000 new rows a day and you are probably not updating that much), you can give it a chance. You definitely won't hurt your database very much if you add those indices.

Pablo Santa Cruz
+1  A: 

First thing to be aware of is that MySQL only uses one index per psuedo-SELECT (not statement) - when you view output of the SELECT using EXPLAIN, you'll see which index was chosen per. EXPLAIN can only be run on SELECTS, so we have to assume that a DELETE/UPDATE is using the same plan when you swap out the syntax for SELECT...

Most databases (embedded ones can be odd) to my knowledge support the use of indexes in the following clauses:

  • SELECT
  • JOIN (ANSI-92 syntax)
  • WHERE (because there's both ANSI-89 and filteration here)
  • HAVING (WHERE equivalent, but unlike WHERE - allows aggregate use without needing subquery)
  • ORDER BY

I'm not 100% on GROUP BY, so I'm omitting it for the time being.

Ultimately, it's the optimizers choice for what to use based on it's algorithm and the statistics it has onhand. You can use the ANALYZE TABLE syntax to refresh the statistics (periodically, not constantly please).

Addendum

MySQL also limits the amount of space that for allocating indexes - 1,000 bytes for MyISAM tables, and 767 bytes for InnoDB tables. Because of MySQL only using one index per psuedo-SELECT, covering indexes (indexes that include more than one column) are a good idea but it really comes to testing the most common query & optimizing for it as best you can. The indexing priority should be:

  1. Primary key (somewhere in v5, index creating for the pk became automatic)
  2. Foreign keys (next most likely JOIN candidate
  3. Filtration criteria (assuming you have the space)
OMG Ponies
Also be aware that indexes only increase data retrieval speed, but negatively impact INSERT and UPDATE speed because the indexes need to be updated.
OMG Ponies
Edited to add the query and the results of EXPLAIN.
hopeseekr