views:

26

answers:

2

Suppose I have the following tables:

CREATE TABLE Game (
    GameID INT UNSIGNED NOT NULL,
    GameType TINYINT UNSIGNED NOT NULL,
    PRIMARY KEY (GameID),
    INDEX Index_GameType (GameType, GameID)
) ENGINE=INNODB

CREATE TABLE HighScore (
    Game INT UNSIGNED NOT NULL,
    Score SMALLINT UNSIGNED,
    PRIMARY KEY (Game),
    INDEX Index_Score (Score, Game),
    CONSTRAINT Constr_Score_Game_fk
        FOREIGN KEY Score_Game_fk (Game) REFERENCES Game (GameID)
) ENGINE=INNODB

(These are slimmed-down versions of real tables I am working with. The real tables have more columns and indexes. The above tables capture the essential features of the situation.)

(The number of different GameTypes should be assumed to be small, so that the index Index_GameType is not very selective.)

Suppose I run the following query:

SELECT
    HighScore.Score
FROM
    HighScore
    JOIN Game ON HighScore.Game = Game.GameID
WHERE
    Game.GameType = 42
ORDER BY
    HighScore.Score DESC
LIMIT 50

Looking at this query and the table design, we can probably agree that the sensible thing would be to scan down the HighScore table and join rows until 50 have been found for which the WHERE condition is satisfied. However, an EXPLAIN for me showed (using my real, more complicated tables) that MySQL actually planned to look for all rows in Game satisfying the WHERE condition, join this with HighScore, and do a filesort to get the rows in sorted order.

It seemed sensible therefore to specify a STRAIGHT_JOIN instead in the above query. Now the EXPLAIN output indicates that the first table, HighScore, is "using index" (as expected), but the number of rows reported appears to be the number of rows in the HighScore table. Should I take this to mean that MySQL plans to take essentially the entire index, join every row in that index to the other table, and only then throw away rows below the top 50? That seems ridiculous, but I'm not sure if that's what it would actually do. Does anyone have any idea?

+2  A: 

Since the fields your are filtering and ordering on are in different tables, they cannot be covered by a single index.

If you add a STRAIGHT_JOIN clause, you force MySQL to take every record from HighScore (using index on Score), find the matching record in Game, check if it's 42 and return (or neglect) it.

Since MySQL cannot tell in advance how many records will match, it will assume the worst and just show the total number of HighScore records in the plan.

In reality, the query will stop after 50 mathing records will be returned.

Quassnoi
That's what I was hoping would be the case. Can you point me to any documentation that confirms that's what MySQL will do?
Hammerite
@Hammerite: this is not documented, but you can easily check it by comparing `key_read_requests` for the queries with and without `LIMIT` clauses (in `MyISAM`, of course).
Quassnoi
See my answer below.
Hammerite
A: 

This answer expands upon the information given by Quassnoi. I use an answer rather than a comment in order to have more space.

I tested running the query with and without the LIMIT clause, as suggested by Quassnoi. Since I am using InnoDB rather than MyISAM, I used the following query to get the number of read requests:

select
    variable_value
from
    information_schema.GLOBAL_STATUS
where
    variable_name = 'innodb_buffer_pool_read_requests';

Before running any queries, this gave 87131. After running the query without the LIMIT clause, it gave 170381. After running the query with the LIMIT clause, it gave 175315.

So the number of read requests involved in the query without LIMIT seems to have been 170381 - 87131 = 83250, while the number of read requests involved in the query with LIMIT seems to have been 175315 - 170381 = 4934. Approximately the same numbers showed up on repeating the experiment. These numbers don't seem to correspond to rows, indeed I'm not sure what they do correspond to in terms of the data fetched*, but what they do seem to show is that verifiably less data was fetched from the disk when the LIMIT query was added. As such I'm inclined to think that Quassnoi is correct and that MySQL does indeed use a sensible strategy for fetching the limited number of rows.

  • The number of read requests involved in the no-LIMIT query is roughly 17 times the number from the other query, but there are much more than 17 * 50 results returned, so it doesn't seem to correspond directly to number of results.
Hammerite