views:

98

answers:

2

I have this table:

CREATE TABLE  `maindb`.`daily_info` (
  `di_date` date NOT NULL,
  `di_sid` int(10) unsigned NOT NULL default '0',
  `di_type` int(10) unsigned NOT NULL default '0',
  `di_name` varchar(20) NOT NULL default '',
  `di_num` int(10) unsigned NOT NULL default '0',
  `di_abt` varchar(1) NOT NULL default 'a',
  PRIMARY KEY  (`di_date`,`di_sid`,`di_type`,`di_name`,`di_abt`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;

When I use this query:

explain
SELECT  MONTH(di_date) as label1, DAYOFMONTH(di_date) as label2, sum(di_num) as count , di_abt as abt
FROM   `daily_info`
WHERE  di_sid=6
       AND di_type = 4
       AND di_name='clk-1'
       AND di_date > '2009-10-01' AND di_date < '2009-10-16'
GROUP BY
       DAYOFMONTH(di_date)
ORDER BY
       TO_DAYS(di_date) DESC

I get:

1, 'SIMPLE', 'daily_info', 'range', 'PRIMARY', 'PRIMARY', '3', '', 2500, 'Using where; Using temporary; Using filesort'

When actually if the key worked and the query would be filtered by di_date, di_sid and di_type, it would need to search only a few dozen rows.

What is wrong with the index (or query?)

Thanks!

+5  A: 

You use the range condition on the first index column which kills possibility to filter on other columns.

There is no single contiguous range in this index which would contain those and only those records that satisfy the condition.

MySQL is not able to do SKIP SCAN which would jump over the distinct values of di_date. That's why it does it's best: uses range access to filter on di_date and uses WHERE to filter on all other fields.

Either recreate the index as this (the best decision):

PRIMARY KEY  (`di_sid`,`di_type`,`di_name`,`di_date`,`di_abt`)

or, if you're unable to recreate the index, you can emulate the SKIP SCAN:

SELECT  MONTH(di.di_date) as label1, DAYOFMONTH(di.di_date) as label2, sum(di.di_num) as count , di.di_abt as abt
FROM    (
        SELECT  DISTINCT di_date
        FROM    daily_info
        WHERE   di_date > '2009-10-01' AND di_date < '2009-10-16'
        ) do
JOIN    daily_info di
ON      di.di_date <= do.di_date
        AND di.di_date>= do.di_date
        AND di_sid = 6
        AND di_type = 4
        AND di_name = 'clk-1'
GROUP BY
        DAYOFMONTH(di.di_date)
ORDER BY
        TO_DAYS(di.di_date) DESC

Make sure that Using index for group-by and Range checked for each record are present in the plan.

This condition:

di.date <= do.date
AND di.date >= do.date

is used instead of simple di.date = do.date to force the range checking.

See this article in my blog for more detailed explanation of emulating SKIP SCAN:

Update:

The latter query actually uses an equijoin and MySQL optimizes it without the tricks.

The trick above applies only to the ranged queries, i. e. when the innermost loop should use the range access, not the ref access.

It would be useful if you had to do something like di_name <= 'clk-1'

This query should work fine:

SELECT  MONTH(di.di_date) as label1, DAYOFMONTH(di.di_date) as label2, sum(di.di_num) as count , di.di_abt as abt
FROM    (
        SELECT  DISTINCT di_date
        FROM    daily_info
        WHERE   di_date > '2009-10-01' AND di_date < '2009-10-16'
        ) do
JOIN    daily_info di
ON      di.di_date = do.di_date
        AND di_sid = 6
        AND di_type = 4
        AND di_name = 'clk-1'
GROUP BY
        DAYOFMONTH(di.di_date)
ORDER BY
        TO_DAYS(di.di_date) DESC

Make sure that di uses ref access on the whole subkey possible here, with key_len = 33

Update 2

In your query, you are using these expressions out of the GROUP BY:

MONTH(di_date)
TO_DAYS(di_date)
di_abt

The query as it is now will sum all values for the 1st, 2nd etc. for any month and year.

I. e. for the first group it will add up all values from Jan 1st, 2000, then Feb 1st, 2000, etc.

Then it will return any random value of MONTH, any random value of TO_DAYS and any random value of di_abt from each group.

Your condition now is within a single month, so it's OK now, but if your condition will span multiple months (to say nothing of years), they query will produce unexpected results.

Do you really want to group by dates?

Quassnoi
Thank you Quassnoi.
Nir
Inedeed I did the 1st option and got filesort.I'll try the subselect option. Should I return the index to the original one?
Nir
You will get a filesort anyway, there is no way to get rid of it in this exact query. And I just noticed a little flaw in your query, see the post update.
Quassnoi
A: 

You are range-scanning the first part of the index - therefore it cannot use the subsequent parts of the index.

The way to improve this is to create another index with the fields in a different order which is more conducive to this particular query.

If your index was di_sid,di_type,di_date then it may be better.

MarkR