views:

117

answers:

3

My MySQL is not strong, so please forgive any rookie mistakes. Short version:

SELECT locId,count,avg FROM destAgg_geo is significantly slower than SELECT * from destAgg_geo

prtt.destAgg is a table keyed on dst_ip (PRIMARY)

mysql> describe prtt.destAgg;
+---------+------------------+------+-----+---------+-------+
| Field   | Type             | Null | Key | Default | Extra |
+---------+------------------+------+-----+---------+-------+
| dst_ip  | int(10) unsigned | NO   | PRI | 0       |       |
| total   | float unsigned   | YES  |     | NULL    |       |
| avg     | float unsigned   | YES  |     | NULL    |       |
| sqtotal | float unsigned   | YES  |     | NULL    |       |
| sqavg   | float unsigned   | YES  |     | NULL    |       |
| count   | int(10) unsigned | YES  |     | NULL    |       |
+---------+------------------+------+-----+---------+-------+

geoip.blocks is a table keyed on both startIpNum and endIpNum (PRIMARY)

mysql> describe geoip.blocks;
+------------+------------------+------+-----+---------+-------+
| Field      | Type             | Null | Key | Default | Extra |
+------------+------------------+------+-----+---------+-------+
| startIpNum | int(10) unsigned | NO   | MUL | NULL    |       |
| endIpNum   | int(10) unsigned | NO   |     | NULL    |       |
| locId      | int(10) unsigned | NO   |     | NULL    |       |
+------------+------------------+------+-----+---------+-------+

destAgg_geo is a view:

CREATE VIEW destAgg_geo AS SELECT * FROM destAgg JOIN geoip.blocks 
  ON destAgg.dst_ip BETWEEN geoip.blocks.startIpNum AND geoip.blocks.endIpNum;

Here's the optimization plan for select *:

mysql> explain select * from destAgg_geo;
+----+-------------+---------+------+---------------+------+---------+------+---------+------------------------------------------------+
| id | select_type | table   | type | possible_keys | key  | key_len | ref  | rows    | Extra                                          |
+----+-------------+---------+------+---------------+------+---------+------+---------+------------------------------------------------+
|  1 | SIMPLE      | blocks  | ALL  | start_end     | NULL | NULL    | NULL | 3486646 |                                                |
|  1 | SIMPLE      | destAgg | ALL  | PRIMARY       | NULL | NULL    | NULL |  101893 | Range checked for each record (index map: 0x1) |
+----+-------------+---------+------+---------------+------+---------+------+---------+------------------------------------------------+

Here's the optimization plan for select with specific columns:

mysql> explain select locId,count,avg from destAgg_geo;
+----+-------------+---------+------+---------------+------+---------+------+---------+------------------------------------------------+
| id | select_type | table   | type | possible_keys | key  | key_len | ref  | rows    | Extra                                          |
+----+-------------+---------+------+---------------+------+---------+------+---------+------------------------------------------------+
|  1 | SIMPLE      | destAgg | ALL  | PRIMARY       | NULL | NULL    | NULL |  101893 |                                                |
|  1 | SIMPLE      | blocks  | ALL  | start_end     | NULL | NULL    | NULL | 3486646 | Range checked for each record (index map: 0x1) |
+----+-------------+---------+------+---------------+------+---------+------+---------+------------------------------------------------+

Here's the optimization plan for every column from destAgg and just the locId column from geoip.blocks:

mysql> explain select dst_ip,total,avg,sqtotal,sqavg,count,locId from destAgg_geo;
+----+-------------+---------+------+---------------+------+---------+------+---------+------------------------------------------------+
| id | select_type | table   | type | possible_keys | key  | key_len | ref  | rows    | Extra                                          |
+----+-------------+---------+------+---------------+------+---------+------+---------+------------------------------------------------+
|  1 | SIMPLE      | blocks  | ALL  | start_end     | NULL | NULL    | NULL | 3486646 |                                                |
|  1 | SIMPLE      | destAgg | ALL  | PRIMARY       | NULL | NULL    | NULL |  101893 | Range checked for each record (index map: 0x1) |
+----+-------------+---------+------+---------------+------+---------+------+---------+------------------------------------------------+

Remove any column except dst_ip and the range check flips to blocks:

mysql> explain select dst_ip,avg,sqtotal,sqavg,count,locId from destAgg_geo;
+----+-------------+---------+------+---------------+------+---------+------+---------+------------------------------------------------+
| id | select_type | table   | type | possible_keys | key  | key_len | ref  | rows    | Extra                                          |
+----+-------------+---------+------+---------------+------+---------+------+---------+------------------------------------------------+
|  1 | SIMPLE      | destAgg | ALL  | PRIMARY       | NULL | NULL    | NULL |  101893 |                                                |
|  1 | SIMPLE      | blocks  | ALL  | start_end     | NULL | NULL    | NULL | 3486646 | Range checked for each record (index map: 0x1) |
+----+-------------+---------+------+---------------+------+---------+------+---------+------------------------------------------------+

which is then much slower. What's going on here?

(Yes, I could just use the * query results and process from there, but I would like to know what's happening and why)

EDIT -- EXPLAIN on the VIEW query:

mysql> explain SELECT * FROM destAgg JOIN geoip.blocks ON destAgg.dst_ip BETWEEN geoip.blocks.startIpNum AND geoip.blocks.endIpNum;
+----+-------------+---------+------+---------------+------+---------+------+---------+------------------------------------------------+
| id | select_type | table   | type | possible_keys | key  | key_len | ref  | rows    | Extra                                          |
+----+-------------+---------+------+---------------+------+---------+------+---------+------------------------------------------------+
|  1 | SIMPLE      | blocks  | ALL  | start_end     | NULL | NULL    | NULL | 3486646 |                                                |
|  1 | SIMPLE      | destAgg | ALL  | PRIMARY       | NULL | NULL    | NULL |  101893 | Range checked for each record (index map: 0x1) |
+----+-------------+---------+------+---------------+------+---------+------+---------+------------------------------------------------+
+1  A: 

MySQL can tell you if you run EXPLAIN PLAN on both queries.

The first query with the columns doesn't include any key columns, so my guess is it has to do a TABLE SCAN.

The second query with the "SELECT *" includes the primary key, so it can use the index.

duffymo
EXPLAIN PLAN gives me an error 1064 (reserved keyword):mysql> EXPLAIN PLAN select * from destAgg_geo;ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'select * from destAgg_geo' at line 1 mysql> EXPLAIN PLAN (select * from destAgg_geo);ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '(select * from destAgg_geo)' at line 1
Pat
EXPLAIN PLAN is Oracle syntax. MySQL simply uses EXPLAIN followed by the query.
Bill Karwin
Thank you, Bill.
duffymo
A: 

I would try putting a compisite index on locId,count,avg and see if that doesn't improve speed.

HLGEM
+1  A: 

The range filter is applied last, so the problem is that the query optimizer is choosing to join the larger table first in one case, and the smaller table first in another. Perhaps someone with more knowledge of the optimizer can tell us why it's joining the tables in a different order for each.

I think the real goal here should be to try to get the JOIN to use an index, so the order of the join wouldn't matter so much.

Marcus Adams
Any tips on how to do that?
Pat