tags:

views:

138

answers:

4

I have a mysql query that's taking several minutes which isn't very good as it's used to create a web page.

Three tables are used: poster_data contains information on individual posters. poster_categories lists all the categories (movies, art, etc) while poster_prodcat lists the posterid number and the categories it can be in e.g. one poster would have multiple lines for say, movies, indiana jones, harrison ford, adventure films, etc.

this is the slow query:

select * 
  from poster_prodcat, 
       poster_data, 
       poster_categories 
 where poster_data.apnumber = poster_prodcat.apnumber 
   and poster_categories.apcatnum = poster_prodcat.apcatnum 
   and poster_prodcat.apcatnum='623'  
ORDER BY aptitle ASC 
   LIMIT 0, 32

According to the explain:

explain

It was taking a few minutes. Poster_data has just over 800,000 rows, while poster_prodcat has just over 17 million. Other category queries with this select are barely noticeable, while poster_prodcat.apcatnum='623' has about 400,000 results and is writing out to disk

A: 

The query you listed is how the final query will look like? (So they have the apcatnum=/ID/ ?)

where poster_data.apnumber=poster_prodcat.apnumber and poster_categories.apcatnum=poster_prodcat.apcatnum and poster_prodcat.apcatnum='623'

poster_prodcat.apcatnum='623' will vastly decrease the data-set mysql has to work on, thus this should be the first parsed part of the query.

Then go on to swap the where-comparisons so those minimizing the data-set the most will be parsed first.

You may also want to try sub-queries. I’m not sure that will help, but mysql probably won’t first get all 3 tables, but first do the sub-query and then the other one. This should minimize memory consumption while querying. Although this is not an option if you really want to select all columns (as you’re using a * there).

Kissaki
Ok, just tried that. Strangely, moving poster_prodcat.apcatnum='623' to be the first of the where's returned 0 rows, whereas there are 422,777 posters in that category.
Ian
A: 

You need to have an index on apnumber in POSTER_DATA. Scanning 841,152 records is killing the performance.

I do have an index on that column:Keyname: posterid Type:unique, Cardinality: 841152, Field:apnumber
Ian
A: 

Looks like the query is using the apptitle index to get the ordering but it is doing a full scan to filter the results. I think it might help if you have a composite index across both apptitle and apnumber on poster_data. MySQL might then be able to use this to do both the sort order and the filter.

create index data_title_anum_idx on poster_data(aptitle,apnumber);
Adam Butler
+1  A: 

hope you find this helpful - http://pastie.org/1105206

drop table if exists poster;
create table poster
(
poster_id int unsigned not null auto_increment primary key,
name varchar(255) not null unique
)
engine = innodb; 


drop table if exists category;
create table category
(
cat_id mediumint unsigned not null auto_increment primary key,
name varchar(255) not null unique
)
engine = innodb; 

drop table if exists poster_category;
create table poster_category
(
cat_id mediumint unsigned not null,
poster_id int unsigned not null,
primary key (cat_id, poster_id) -- note the clustered composite index !!
)
engine = innodb;

-- FYI http://dev.mysql.com/doc/refman/5.0/en/innodb-index-types.html

select count(*) from category
count(*)
========
500,000


select count(*) from poster
count(*)
========
1,000,000

select count(*) from poster_category
count(*)
========
125,675,688

select count(*) from poster_category where cat_id = 623
count(*)
========
342,820

explain
select
 p.*,
 c.*
from
 poster_category pc
inner join category c on pc.cat_id = c.cat_id
inner join poster p on pc.poster_id = p.poster_id
where
 pc.cat_id = 623
order by
 p.name
limit 32;

id  select_type table   type    possible_keys   key     key_len ref                         rows
==  =========== =====   ====    =============   ===     ======= ===                         ====
1   SIMPLE      c       const   PRIMARY         PRIMARY 3       const                       1   
1   SIMPLE      p       index   PRIMARY         name    257     null                        32  
1   SIMPLE      pc      eq_ref  PRIMARY         PRIMARY 7       const,foo_db.p.poster_id    1   

select
 p.*,
 c.*
from
 poster_category pc
inner join category c on pc.cat_id = c.cat_id
inner join poster p on pc.poster_id = p.poster_id
where
 pc.cat_id = 623
order by
 p.name
limit 32;

Statement:21/08/2010 
0:00:00.021: Query OK
f00
Can I ask why you chose innodb? (I'm not really up on the differences.)
Ian
did you check the explain plan and the resultant query speed ?clustered indexes in a word.
f00
you might want to take a look at this too http://tag1consulting.com/MySQL_Engines_MyISAM_vs_InnoDB
f00
just to clarify 1 million posters, 500K categories, 125 million poster categories and a runtime of 0.02 seconds for cat_id = 623 (300K+ rows)
f00
I made innodb copies of those tables and tried out the query. On the query with both inner joins I rec'd an empty result set after 3 mins. Removing the first inner join returned the 32 rows in 20.13 seconds. However, running my original query returned the set in .10 seconds, so I'm happy. Just curious why the inner joins took so long. Thanks for your help.
Ian
no problem with the help - shame you couldnt get the innodb solution to work for you :(
f00
No, the innodb tables were faster...that's where I got the .10 sec response. Just didn't work with the inner joins for some reason. But I'm happy with the speed using my query on your tables. :-)
Ian
oh sorry i misunderstood - well maybe i can help you sort out the query ?? If you found my answer helpful could you accept it so i get the karma points thanks :)
f00
you probably need to tune your innodb settings - the default ones wont suit your requirements :)
f00