views:

67

answers:

1

I've downloaded the employees database and executed some queries for benchmarking purposes.
Then I noticed that one query didn't use a covering index, although there was a corresponding index that I created earlier. Only when I added a FORCE INDEX clause to the query, it used a covering index.
I've uploaded two files, one is the executed SQL queries and the other is the results.
Can you tell why the query uses a covering-index only when a FORCE INDEX clause is added? The EXPLAIN shows that in both cases, the index dept_no_from_date_idx is being used anyway.

To adapt myself to the standards of SO, I'm also writing the content of the two files here:

The SQL queries:

USE employees;

/* Creating an index for an index-covered query */
    CREATE INDEX dept_no_from_date_idx ON dept_emp (dept_no, from_date);

/* Show `dept_emp` table structure, indexes and generic data */
    SHOW TABLE STATUS LIKE "dept_emp";
    DESCRIBE dept_emp;
    SHOW KEYS IN dept_emp;

/* The EXPLAIN shows that the subquery doesn't use a covering-index */
EXPLAIN SELECT SQL_NO_CACHE * FROM dept_emp INNER JOIN (
        /* The subquery should use a covering index, but isn't */
        SELECT SQL_NO_CACHE emp_no, dept_no FROM dept_emp WHERE dept_no="d001" ORDER BY from_date DESC LIMIT 20000,50
    ) AS `der` USING (`emp_no`, `dept_no`);

/* The EXPLAIN shows that the subquery DOES use a covering-index,
        thanks to the FORCE INDEX clause */
EXPLAIN SELECT SQL_NO_CACHE * FROM dept_emp INNER JOIN (
        /* The subquery use a covering index */
        SELECT SQL_NO_CACHE emp_no, dept_no FROM dept_emp FORCE INDEX(dept_no_from_date_idx) WHERE dept_no="d001" ORDER BY from_date DESC LIMIT 20000,50
    ) AS `der` USING (`emp_no`, `dept_no`);

The results:

--------------
/* Creating an index for an index-covered query */
    CREATE INDEX dept_no_from_date_idx ON dept_emp (dept_no, from_date)
--------------

Query OK, 331603 rows affected (33.95 sec)
Records: 331603  Duplicates: 0  Warnings: 0

--------------
/* Show `dept_emp` table structure, indexes and generic data */
    SHOW TABLE STATUS LIKE "dept_emp"
--------------

+----------+--------+---------+------------+--------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+-------------+------------+-----------------+----------+----------------+---------+
| Name     | Engine | Version | Row_format | Rows   | Avg_row_length | Data_length | Max_data_length | Index_length | Data_free | Auto_increment | Create_time         | Update_time | Check_time | Collation       | Checksum | Create_options | Comment |
+----------+--------+---------+------------+--------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+-------------+------------+-----------------+----------+----------------+---------+
| dept_emp | InnoDB |      10 | Compact    | 331883 |             36 |    12075008 |               0 |     21544960 |  29360128 |           NULL | 2010-05-04 13:07:49 | NULL        | NULL       | utf8_general_ci |     NULL |                |         |
+----------+--------+---------+------------+--------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+-------------+------------+-----------------+----------+----------------+---------+
1 row in set (0.47 sec)

--------------
    DESCRIBE dept_emp
--------------

+-----------+---------+------+-----+---------+-------+
| Field     | Type    | Null | Key | Default | Extra |
+-----------+---------+------+-----+---------+-------+
| emp_no    | int(11) | NO   | PRI | NULL    |       |
| dept_no   | char(4) | NO   | PRI | NULL    |       |
| from_date | date    | NO   |     | NULL    |       |
| to_date   | date    | NO   |     | NULL    |       |
+-----------+---------+------+-----+---------+-------+
4 rows in set (0.05 sec)

--------------
    SHOW KEYS IN dept_emp
--------------

+----------+------------+-----------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| Table    | Non_unique | Key_name              | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+----------+------------+-----------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| dept_emp |          0 | PRIMARY               |            1 | emp_no      | A         |      331883 |     NULL | NULL   |      | BTREE      |         |
| dept_emp |          0 | PRIMARY               |            2 | dept_no     | A         |      331883 |     NULL | NULL   |      | BTREE      |         |
| dept_emp |          1 | emp_no                |            1 | emp_no      | A         |      331883 |     NULL | NULL   |      | BTREE      |         |
| dept_emp |          1 | dept_no               |            1 | dept_no     | A         |           7 |     NULL | NULL   |      | BTREE      |         |
| dept_emp |          1 | dept_no_from_date_idx |            1 | dept_no     | A         |          13 |     NULL | NULL   |      | BTREE      |         |
| dept_emp |          1 | dept_no_from_date_idx |            2 | from_date   | A         |      165941 |     NULL | NULL   |      | BTREE      |         |
+----------+------------+-----------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
6 rows in set (0.23 sec)

--------------
/* The EXPLAIN shows that the subquery doesn't use a covering-index */
EXPLAIN SELECT SQL_NO_CACHE * FROM dept_emp INNER JOIN (
        /* The subquery should use a covering index, but isn't */
        SELECT SQL_NO_CACHE emp_no, dept_no FROM dept_emp WHERE dept_no="d001" ORDER BY from_date DESC LIMIT 20000,50
    ) AS `der` USING (`emp_no`, `dept_no`)
--------------

+----+-------------+------------+--------+----------------------------------------------+-----------------------+---------+------------------------+-------+-------------+
| id | select_type | table      | type   | possible_keys                                | key                   | key_len | ref                    | rows  | Extra       |
+----+-------------+------------+--------+----------------------------------------------+-----------------------+---------+------------------------+-------+-------------+
|  1 | PRIMARY     | <derived2> | ALL    | NULL                                         | NULL                  | NULL    | NULL                   |    50 |             |
|  1 | PRIMARY     | dept_emp   | eq_ref | PRIMARY,emp_no,dept_no,dept_no_from_date_idx | PRIMARY               | 16      | der.emp_no,der.dept_no |     1 |             |
|  2 | DERIVED     | dept_emp   | ref    | dept_no,dept_no_from_date_idx                | dept_no_from_date_idx | 12      |                        | 21402 | Using where |
+----+-------------+------------+--------+----------------------------------------------+-----------------------+---------+------------------------+-------+-------------+
3 rows in set (0.09 sec)

--------------
/* The EXPLAIN shows that the subquery DOES use a covering-index,
        thanks to the FORCE INDEX clause */
EXPLAIN SELECT SQL_NO_CACHE * FROM dept_emp INNER JOIN (
        /* The subquery use a covering index */
        SELECT SQL_NO_CACHE emp_no, dept_no FROM dept_emp FORCE INDEX(dept_no_from_date_idx) WHERE dept_no="d001" ORDER BY from_date DESC LIMIT 20000,50
    ) AS `der` USING (`emp_no`, `dept_no`)
--------------

+----+-------------+------------+--------+----------------------------------------------+-----------------------+---------+------------------------+-------+--------------------------+
| id | select_type | table      | type   | possible_keys                                | key                   | key_len | ref                    | rows  | Extra                    |
+----+-------------+------------+--------+----------------------------------------------+-----------------------+---------+------------------------+-------+--------------------------+
|  1 | PRIMARY     | <derived2> | ALL    | NULL                                         | NULL                  | NULL    | NULL                   |    50 |                          |
|  1 | PRIMARY     | dept_emp   | eq_ref | PRIMARY,emp_no,dept_no,dept_no_from_date_idx | PRIMARY               | 16      | der.emp_no,der.dept_no |     1 |                          |
|  2 | DERIVED     | dept_emp   | ref    | dept_no_from_date_idx                        | dept_no_from_date_idx | 12      |                        | 37468 | Using where; Using index |
+----+-------------+------------+--------+----------------------------------------------+-----------------------+---------+------------------------+-------+--------------------------+
3 rows in set (0.05 sec)

Bye

Edit:
I noticed that there's quite a significant execution speed differences between that last two queries, the results are placed before you:

SELECT SQL_NO_CACHE * FROM dept_emp INNER JOIN (
    SELECT SQL_NO_CACHE emp_no, dept_no FROM dept_emp WHERE dept_no="d001" ORDER BY from_date DESC LIMIT 20000,50
) AS `der` USING (`emp_no`, `dept_no`)
--------------

+--------+---------+------------+------------+
| emp_no | dept_no | from_date  | to_date    |
+--------+---------+------------+------------+
|  38552 | d001    | 1985-04-16 | 2000-10-20 |
            ... omitted ...
|  98045 | d001    | 1985-03-28 | 9999-01-01 |
+--------+---------+------------+------------+
50 rows in set (0.31 sec)

--------------
SELECT SQL_NO_CACHE * FROM dept_emp INNER JOIN (
    SELECT SQL_NO_CACHE emp_no, dept_no FROM dept_emp FORCE INDEX(dept_no_from_date_idx) WHERE dept_no="d001" ORDER BY from_date DESC LIMIT 20000,50
) AS `der` USING (`emp_no`, `dept_no`)
--------------

+--------+---------+------------+------------+
| emp_no | dept_no | from_date  | to_date    |
+--------+---------+------------+------------+
|  38552 | d001    | 1985-04-16 | 2000-10-20 |
            ... omitted ...
|  98045 | d001    | 1985-03-28 | 9999-01-01 |
+--------+---------+------------+------------+
50 rows in set (0.06 sec)

BUT, if I change the order of execution (making the last query to be executed first, and the first query to be executed last), then the execution speed is the same:

--------------
SELECT SQL_NO_CACHE * FROM dept_emp INNER JOIN (
    SELECT SQL_NO_CACHE emp_no, dept_no FROM dept_emp FORCE INDEX(dept_no_from_date_idx) WHERE dept_no="d001" ORDER BY from_date DESC LIMIT 20000,50
) AS `der` USING (`emp_no`, `dept_no`)
--------------

+--------+---------+------------+------------+
| emp_no | dept_no | from_date  | to_date    |
+--------+---------+------------+------------+
|  38552 | d001    | 1985-04-16 | 2000-10-20 |
            ... omitted ...
|  98045 | d001    | 1985-03-28 | 9999-01-01 |
+--------+---------+------------+------------+
50 rows in set (0.08 sec)

--------------
SELECT SQL_NO_CACHE * FROM dept_emp INNER JOIN (
    SELECT SQL_NO_CACHE emp_no, dept_no FROM dept_emp WHERE dept_no="d001" ORDER BY from_date DESC LIMIT 20000,50
) AS `der` USING (`emp_no`, `dept_no`)
--------------

+--------+---------+------------+------------+
| emp_no | dept_no | from_date  | to_date    |
+--------+---------+------------+------------+
|  38552 | d001    | 1985-04-16 | 2000-10-20 |
            ... omitted ...
|  98045 | d001    | 1985-03-28 | 9999-01-01 |
+--------+---------+------------+------------+
50 rows in set (0.08 sec)

It can't be that the second query is being taken from the cache, because that SQL_NO_CACHE is written in both queries. So why in the first example the first query took 0.31 sec and the second 0.06 sec, but in the second example, both queries take 0.08 sec ?

Edit2:
I think that the execution speed differences derives from the OS cache and perhaps other factors. When executing the above 2 queries repeatedly, the execution time differences become negligible. I executed the above 2 queries for 3 times repeatedly and got the following results:

#1: 0.08 sec
#2: 0.03 sec
#1: 0.05 sec
#2: 0.05 sec
#1: 0.03 sec
#2: 0.05 sec
+1  A: 

In fact, both your queries do use the covering index.

Your index definition does not include emp_no, so in MyISAM, Using index would be impossible even with the FORCE INDEX clause.

However, InnoDB tables are clustered and every index implicitly contains the PRIMARY KEY as a record pointer.

This means that your index is in fact an index on (dept_no, from_date, emp_no, dept_no) and hence contains all the fields required.

EXPLAIN PLAN does not always reflect this correctly, but InnoDB engine does cope with this.

You can check it by comparing performance of these two queries:

SELECT SQL_NO_CACHE * FROM dept_emp INNER JOIN (
        /* The subquery use a covering index */
        SELECT SQL_NO_CACHE from_date, emp_no, dept_no FROM dept_emp WHERE dept_no="d001" ORDER BY from_date DESC LIMIT 20000,50
    ) AS `der` USING (`emp_no`, `dept_no`);

and

SELECT SQL_NO_CACHE * FROM dept_emp INNER JOIN (
        /* The subquery use a covering index */
        SELECT SQL_NO_CACHE to_date, emp_no, dept_no FROM dept_emp WHERE dept_no="d001" ORDER BY from_date DESC LIMIT 20000,50
    ) AS `der` USING (`emp_no`, `dept_no`);

You will see that despite the fact that the plans will be shown as identical, the second query will take much more time (exactly because to_date is not covered).

It's a bug in EXPLAIN PLAN, not in InnoDB engine.

Quassnoi
Indeed there was a major difference: the first query took 0.25 sec, the second took 2.17 sec. I'm familiar with InnoDB's structure etc. I'm looking for the bug report in mysql.com but can't find it. Do you know where is the report? Thanks :)
Dor
@dor: http://bugs.mysql.com/report.php
Quassnoi
You mean that **I** should report on this bug? Until now, no one noticed? Neat =D
Dor
@Quassnoi: Something seems weird about the execution speed, please check out my edit. Thanks
Dor
@Quassnoi, nice detective work.
Marcus Adams
@Quassnoi: Never mind, I think I solved it in my second edit.
Dor