views:

510

answers:

3

Hi All.

I need to optimize a MYSQL query doing an order by. No matter what I do, mysql ends up doing a filesort instead of using the index.

Here's my table ddl... (Yes, In this case the DAYSTAMP and TIMESTAMP columns are exactly the same).

CREATE TABLE DB_PROBE.TBL_PROBE_DAILY ( DAYSTAMP date NOT NULL, TIMESTAMP date NOT NULL, SOURCE_ADDR varchar(64) NOT NULL, SOURCE_PORT int(10) NOT NULL, DEST_ADDR varchar(64) NOT NULL, DEST_PORT int(10) NOT NULL, PACKET_COUNT int(20) NOT NULL, BYTES int(20) NOT NULL, UNIQUE KEY IDX_TBL_PROBE_DAILY_05 (DAYSTAMP,SOURCE_ADDR(16),SOURCE_PORT,DEST_ADDR(16),DEST_PORT,TIMESTAMP), KEY IDX_TBL_PROBE_DAILY_01 (SOURCE_ADDR(16),TIMESTAMP), KEY IDX_TBL_PROBE_DAILY_02 (DEST_ADDR(16),TIMESTAMP), KEY IDX_TBL_PROBE_DAILY_03 (SOURCE_PORT,TIMESTAMP), KEY IDX_TBL_PROBE_DAILY_04 (DEST_PORT,TIMESTAMP), KEY IDX_TBL_PROBE_DAILY_06 (DAYSTAMP,TIMESTAMP,BYTES) ) ENGINE=InnoDB DEFAULT CHARSET=latin1 /*!50100 PARTITION BY RANGE (to_days(DAYSTAMP)) (PARTITION TBL_PROBE_DAILY_P20100303 VALUES LESS THAN (734200) ENGINE = InnoDB, PARTITION TBL_PROBE_DAILY_P20100304 VALUES LESS THAN (734201) ENGINE = InnoDB, PARTITION TBL_PROBE_DAILY_P20100305 VALUES LESS THAN (734202) ENGINE = InnoDB, PARTITION TBL_PROBE_DAILY_P20100306 VALUES LESS THAN (734203) ENGINE = InnoDB) */;

The partitions are daily and I've added IDX_TBL_PROBE_DAILY_06 especially for the query I'm trying to get working, which is:

select SOURCE_ADDR as 'Source_IP', SOURCE_PORT as 'Source_Port', DEST_ADDR as 'Destination_IP', DEST_PORT as 'Destination_Port', BYTES from TBL_PROBE_DAILY where DAYSTAMP >= '2010-03-04' and DAYSTAMP <= '2010-03-04' and TIMESTAMP >= FROM_UNIXTIME(1267653600) and TIMESTAMP <= FROM_UNIXTIME(1267687228) order by bytes desc limit 20;

The explain plan as follows: +----+-------------+-----------------+---------------------------+-------+-----------------------------------------------+------------------------+---------+------+--------+-----------------------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-----------------+---------------------------+-------+-----------------------------------------------+------------------------+---------+------+--------+-----------------------------+ | 1 | SIMPLE | TBL_PROBE_DAILY | TBL_PROBE_DAILY_P20100304 | range | IDX_TBL_PROBE_DAILY_05,IDX_TBL_PROBE_DAILY_06 | IDX_TBL_PROBE_DAILY_05 | 3 | NULL | 216920 | Using where; Using filesort | +----+-------------+-----------------+---------------------------+-------+-----------------------------------------------+------------------------+---------+------+--------+-----------------------------+

I've also tried to FORCE INDEX (IDX_TBL_PROBE_DAILY_06) , in which case it happily uses IDX_06 to satisfy the where constraints, but still does a filesort :(

I cant imagine index sorting impossible on partitioned tables? InnoDB behaves different to MyISAM in this regard? I would have thought InnoDBs index+data caching to be ideal for index sorting.

Any help will be much appreciated... I've been trying all week to optimize this query in different ways, without much success.

A: 

I couldn't read the definition. Here it is formatted:

CREATE TABLE DB_PROBE.TBL_PROBE_DAILY ( 
  DAYSTAMP date NOT NULL, 
  TIMESTAMP date NOT NULL, 
  SOURCE_ADDR varchar(64) NOT NULL, 
  SOURCE_PORT int(10) NOT NULL, 
  DEST_ADDR varchar(64) NOT NULL, 
  DEST_PORT int(10) NOT NULL, 
  PACKET_COUNT int(20) NOT NULL, 
  BYTES int(20) NOT NULL, 
UNIQUE KEY IDX_TBL_PROBE_DAILY_05 (DAYSTAMP,SOURCE_ADDR(16),SOURCE_PORT,
                                   DEST_ADDR(16),DEST_PORT,TIMESTAMP), 
KEY IDX_TBL_PROBE_DAILY_01 (SOURCE_ADDR(16),TIMESTAMP), 
KEY IDX_TBL_PROBE_DAILY_02 (DEST_ADDR(16),TIMESTAMP), 
KEY IDX_TBL_PROBE_DAILY_03 (SOURCE_PORT,TIMESTAMP), 
KEY IDX_TBL_PROBE_DAILY_04 (DEST_PORT,TIMESTAMP), 
KEY IDX_TBL_PROBE_DAILY_06 (DAYSTAMP,TIMESTAMP,BYTES) 
) ENGINE=InnoDB DEFAULT CHARSET=latin1  

/*!50100 PARTITION BY RANGE (to_days(DAYSTAMP)) 

(PARTITION TBL_PROBE_DAILY_P20100303 VALUES LESS THAN (734200) ENGINE = InnoDB, 
 PARTITION TBL_PROBE_DAILY_P20100304 VALUES LESS THAN (734201) ENGINE = InnoDB, 
 PARTITION TBL_PROBE_DAILY_P20100305 VALUES LESS THAN (734202) ENGINE = InnoDB, 
 PARTITION TBL_PROBE_DAILY_P20100306 VALUES LESS THAN (734203) ENGINE = InnoDB) */;

The Query:

select SOURCE_ADDR as 'Source_IP',    
       SOURCE_PORT as 'Source_Port', 
       DEST_ADDR as 'Destination_IP', 
       DEST_PORT as 'Destination_Port', 
       BYTES 
from TBL_PROBE_DAILY 
where DAYSTAMP >= '2010-03-04' and DAYSTAMP <= '2010-03-04' 
  and TIMESTAMP >= FROM_UNIXTIME(1267653600) and TIMESTAMP <= FROM_UNIXTIME(1267687228) 
order by bytes desc limit 20;

I suspect the problem is that your query contains two range queries. I my experience, MySQL cannot optimise beyond the first range query it encounters, and so as far as it is concerned, any index beginning with DAYSTAMP is equivalent to any other.

The clue in the explain is key length: this shows how much of the index value actually gets used. It is probably the same value (3) even when you force it to use the index you want.

Martin
Thanx for the hint.Should I drop DAYSTAMP off the index or drop the TIMESTAMP column all together (which I can do on this table). I guess I'll try both. Just takes forever to re-index.
Strahd_za
Only you know the meaning and shape of your data. I don't know what your TIMESTAMP and DAYSTAMP fields mean, so I cannot advise. For speed, can you just take an extract from the data and experiment on that ?
Martin
Unfortunately no luck. I tried just dropping daystamp off the index, hoping that in the where clause daystamp would prune the partitions and a single range query on the index would help, but didnt. Then I just dropped and recreated the table (I can generate new data quick enough), without the timestamp field and only keeping IDX_05 and IDX_06 (the only ones I need for this experiment)... but without the timestamp column obviously. Still the explain plan indicates a filesort and key_len = 3 (which I'm still not entirely sure what that means).
Strahd_za
http://dev.mysql.com/doc/refman/5.0/en/storage-requirements.html explains that a DATE field takes 3 bytes. The key-len in the explain indicates how many bytes of the index value is actually used in the comparison - in your case 3 bytes, meaning only the first DATE component is used.
Martin
Another approach might be to expand your DAYSTAMP query into an IN clause: this might avoid the range scan on DAYSTAMP, and allow your second index component to be used. Eg. "where DAYSTAMP IN ('2010-03-04', '2010-03-05' )".
Martin
A: 

Ok. Looks like swapping the columns in the index did the trick.
I don't really know why... maybe someone else has an explanation?

Either way, if I add an index

create index IDX_TBL_PROBE_DAILY_07 on TBL_PROBE_DAILY(BYTES,DAYSTAMP)   

then mysql favors IDX07 (even without the force index) and does an index sort instead of file sort.

Strahd_za