views:

186

answers:

1

Hi guys, I'm trying to optimize relatively big mysql (myisam) table with 220,000 rows. The table itself is not so big - about 23.5MB in size. So, what's the real problem? - i got query like this:

SELECT * FROM table WHERE DATE_FORMAT(date_field, '%m%d') = '1128' LIMIT 10

I tried to set an index on date_field but EXPLAIN show that the index was not used at all ... i guess this is not so strange because of the DATE_FORMAT() . So, i'm planing to add another column that will hold the dates as '%m%d' and put an index on it. The only reason i don't want to do this is because of the data duplication.
Btw I use date_field is a birthdate field and I'm sure i always need the date_field as %Y-%m-%d or just %m%d

Do you have better suggestion about how to optimize the query above ? Thanks in advance !!!

Some info:

MySQL version: 5.0.51b-log
OS: slackware 12.1
CPU: Pentium III (Coppermine) at 996.783Mhz
RAM: 512MB DDR
HDD: 80GB SATA

P.S I tried to add another column that hold the dates as %m%d . The results are very good but i still don't like this approach. I'm waiting for more suggestions!

+1  A: 

If you always need a wildcard on the year, like your query there, I'm not sure mysql will be able to use an index on a date/datetime field

If these are only dates, you can create a time_dimension table though, and prepopulate that with a calendar for the next handful of years. I've a stored procedure to do that if you should need one.

create table time_dimension (
 dbdate date primary key,
 year int NOT NULL,
 month int  NOT NULL , 
 day int NOT NULL,
 KEY(year),
 KEY(month);
 KEY(day);
);

You'd join your big data table to this relativly small table and filter on its field. e.g.

SELECT * FROM data_table d 
  inner join time_dimension t 
    on d.date_field=t.dbdate 
where t.day=28 and t.month=11 LIMIT 10

This leverages filtering on the little time_dimension, and the joins on date_field = dbdate will normally use indexes.

nos
Thanks for the suggestion nos! I don't think it's very useful in my case(i inserted some notes) but in general this is a good direction!!!
plamen
It's good for the future too. If you need to filter all Thursdays, you'd just add another column to time_dimension, and use where t.dayname = 'Thursday'. Or all weekends in a month will become where t.year=2009 and t.month=9 and t.weekend_flag = 1 :-)
nos
That doesn't really help when dealing with birthdates unless the database is geared toward infants.
jmucchiello
A time_dimension table of 150 years is only about 55k rows, filtering on that using indexes is going to beat a sequential scan on 200k rows. And really beat it when that 200k table begins to grow big.
nos
seems interesting. Is it possible for me to get the dump of time_dimention table mentioned above?
shantanuo