views:

783

answers:

2

I have a browse category query that im trying to optimize. Im ending up with Using temporary; Using filesort in the explain and the query is slow on a category with say 60,000 rows. If i remove the Order By clauses the query runs very fast .05 seconds to do 60,000 rows. With the Order By clauses its very slow around 5 seconds. Parts contains some 500,000 rows as does Parts_Category.

I have a group index on Parts (status, level, warehouse, updated) called sort_index

At the top of the explain I have | ALL | Using temporary; Using filesort

All the other indexes are showing OK. Can somebody please tell me what the problem might be? Im out of ideas. Maybe i should rearrange this query so i can get better performance perhaps?

query.

SELECT Parts.*, Image.type, Image.width, Image.height,
(SELECT name FROM Location_State WHERE id = Parts.state_id) AS state, 
(SELECT name FROM Location_Region WHERE id = Parts.region_id) AS region, 
(SELECT start_date FROM Promotion WHERE id = Parts.promotion_id) AS promotion_start_date, 
(SELECT end_date FROM Promotion WHERE id = Parts.promotion_id) AS promotion_end_date 
FROM ( SELECT parts_id FROM Parts_Category WHERE Parts_Category.category_id = '40' 
UNION SELECT parts_id FROM Parts_Category WHERE Parts_Category.main_category_id = '40') cid 
LEFT JOIN Image ON Parts.image_id = Image.id
JOIN Parts ON Parts.id = cid.parts_id AND Parts.status = 'A'
ORDER BY Parts.level DESC, Parts.warehouse DESC, Parts.updated DESC LIMIT 0, 15
Table structure for table Parts

Field   Type Null Default
id  int(11) No auto_increment
image_id    int(11) Yes 0
gallery_id  int(11) Yes 0
image_count int(3) Yes 0
promotion_id    int(11) Yes 0
country_id  int(11) Yes NULL
state_id    int(11) Yes NULL
region_id   int(11) Yes NULL
city_id int(11) Yes NULL
area_id int(11) Yes NULL
updated datetime Yes 0000-00-00 00:00:00
entered datetime Yes 0000-00-00 00:00:00
renewal_date    date Yes 0000-00-00
discount_id varchar(10) Yes NULL
title           varchar(100) Yes 
search_title    varchar(255) Yes 
warehouse   varchar(50) Yes 
url varchar(255) Yes 
display_url varchar(255) Yes 
friendly_url    varchar(100) Yes NULL
description varchar(255) Yes 
keywords    varchar(1000) Yes NULL
attachment_file varchar(255) Yes 
attachment_caption  varchar(255) Yes 
status  char(1) Yes 
level   tinyint(3) Yes 0
worldwide   tinyint(1) Yes 0
random_number   int(11) Yes NULL
reminder    tinyint(4) Yes NULL
category_search varchar(1000) Yes 
video_snippet   varchar(1000) Yes 
importID    int(11) Yes 0

Indexes

PRIMARY             518623     id
random_number INDEX 32201   random_number
country_id  INDEX  1    country_id
state_id    INDEX  8    state_id
region_id   INDEX  5    region_id
renewal_date    INDEX  1    renewal_date
worldwide   INDEX  1    worldwide
friendly_url    INDEX  518623   friendly_url
promotion_id    INDEX  1    promotion_id
city_id      INDEX  1   city_id
area_id     INDEX  1    area_id
zip_code    INDEX  2790    zip_code
importID    INDEX  518623   importID
image_id    INDEX  10    image_id

--------------
index_browse_category   INDEX  52 
level
status
warehouse   
updated
-----------------
keywords    FULLTEXT  1 
description
keywords
category_search 


Parts_Category

id        int(11)     No   auto_increment   
parts_id     int(11)    No  0   
category_id      int(11)    No  0   
main_category_id    int(10)    No  0  

Index

PRIMARY          PRIMARY   519330      id
category_id          INDEX  519330    category_id
parts_id
main_category_id     INDEX  519330           main_category_id
parts_id





+5  A: 

Try rewriting your query as this:

SELECT  p.*, i.type, i.width, i.height,
        (SELECT name FROM Location_State WHERE id = p.state_id) AS state, 
        (SELECT name FROM Location_Region WHERE id = p.region_id) AS region, 
        (SELECT start_date FROM Promotion WHERE id = p.promotion_id) AS promotion_start_date, 
        (SELECT end_date FROM Promotion WHERE id = p.promotion_id) AS promotion_end_date 
FROM    parts p
LEFT JOIN
        image i
ON      i.id = p.image_id
WHERE   EXISTS (
        SELECT  NULL
        FROM    Parts_Category pc
        WHERE   pc.category_id = '40'
                AND pc.parts_id = p.id
        UNION ALL
        SELECT  NULL
        FROM    Parts_Category pc
        WHERE   pc.main_category_id = '40'
                AND pc.parts_id = p.id
        )
        AND p.status = 'A'
ORDER BY
        p.status DESC, p.level DESC, p.warehouse DESC, p.updated DESC
LIMIT   15

You need the following indexes for this to work efficiently:

parts (status, level, warehouse, updated) -- this one you have
parts_category (category_id, parts_id)
parts_category (main_category_id, parts_id)

Update:

I just created the tables as this:

DROP TABLE IF EXISTS `test`.`image`;
CREATE TABLE  `test`.`image` (
  `id` int(11) NOT NULL,
  `type` int(11) NOT NULL,
  `width` int(11) NOT NULL,
  `height` int(11) NOT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

DROP TABLE IF EXISTS `test`.`location_region`;
CREATE TABLE  `test`.`location_region` (
  `id` int(11) NOT NULL,
  `name` varchar(20) NOT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

DROP TABLE IF EXISTS `test`.`location_state`;
CREATE TABLE  `test`.`location_state` (
  `id` int(11) NOT NULL,
  `name` varchar(20) NOT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

DROP TABLE IF EXISTS `test`.`parts`;
CREATE TABLE  `test`.`parts` (
  `id` int(11) NOT NULL,
  `status` char(1) NOT NULL,
  `level` int(11) NOT NULL,
  `warehouse` int(11) NOT NULL,
  `updated` int(11) NOT NULL,
  `state_id` int(11) NOT NULL,
  `region_id` int(11) NOT NULL,
  `promotion_id` int(11) NOT NULL,
  `image_id` int(11) NOT NULL DEFAULT '1',
  PRIMARY KEY (`id`),
  KEY `status` (`status`,`level`,`warehouse`,`updated`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

DROP TABLE IF EXISTS `test`.`parts_category`;
CREATE TABLE  `test`.`parts_category` (
  `id` int(11) NOT NULL,
  `parts_id` int(11) NOT NULL,
  `category_id` int(11) NOT NULL,
  `main_category_id` int(11) NOT NULL,
  PRIMARY KEY (`id`),
  KEY `ix_pc_cat_parts` (`category_id`,`parts_id`),
  KEY `ix_pc_main_parts` (`main_category_id`,`parts_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

DROP TABLE IF EXISTS `test`.`promotion`;
CREATE TABLE  `test`.`promotion` (
  `id` int(11) NOT NULL,
  `start_date` datetime NOT NULL,
  `end_date` datetime NOT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

and filled them with sample data:

INSERT
INTO    parts
SELECT  id,
        CASE WHEN RAND() < 0.1 THEN 'A' ELSE 'B' END,
        RAND() * 100,
        RAND() * 100,
        RAND() * 100,
        RAND() * 50,
        RAND() * 50,
        RAND() * 50,
        RAND() * 50
FROM    t_source
LIMIT 500000;
INSERT
INTO    parts_category
SELECT  id,
        id,
        RAND() * 100,
        RAND() * 100
FROM    t_source
LIMIT 500000;
INSERT
INTO    location_state
SELECT  id, CONCAT('State ', id)
FROM    t_source
LIMIT 1000;
INSERT
INTO    location_region
SELECT  id, CONCAT('Region ', id)
FROM    t_source
LIMIT 1000;
INSERT
INTO    promotion
SELECT  id,
        '2009-07-22' - INTERVAL RAND() * 5 - 20 DAY,
        '2009-07-22' - INTERVAL RAND() * 5 DAY
FROM    t_source
LIMIT 1000;

The query above runs for 30 milliseconds and yields the following plan:

1, 'PRIMARY', 'p', 'ref', 'status', 'status', '3', 'const', 107408, 'Using where'
1, 'PRIMARY', 'i', 'eq_ref', 'PRIMARY', 'PRIMARY', '4', 'test.p.image_id', 1, ''
6, 'DEPENDENT SUBQUERY', 'pc', 'ref', 'ix_pc_cat_parts', 'ix_pc_cat_parts', '8', 'const,test.p.id', 1, 'Using index'
7, 'DEPENDENT UNION', 'pc', 'ref', 'ix_pc_main_parts', 'ix_pc_main_parts', '8', 'const,test.p.id', 1, 'Using index'
, 'UNION RESULT', '<union6,7>', 'ALL', '', '', '', '', , ''
5, 'DEPENDENT SUBQUERY', 'Promotion', 'eq_ref', 'PRIMARY', 'PRIMARY', '4', 'test.p.promotion_id', 1, ''
4, 'DEPENDENT SUBQUERY', 'Promotion', 'eq_ref', 'PRIMARY', 'PRIMARY', '4', 'test.p.promotion_id', 1, ''
3, 'DEPENDENT SUBQUERY', 'Location_Region', 'eq_ref', 'PRIMARY', 'PRIMARY', '4', 'test.p.region_id', 1, ''
2, 'DEPENDENT SUBQUERY', 'Location_State', 'eq_ref', 'PRIMARY', 'PRIMARY', '4', 'test.p.state_id', 1, ''

As you can see, no temporary, no filesort, everything's very fast.

To help you anymore, I just need to see how your tables are defined.

Quassnoi
Thanks again Quassnoi. Seem to be getting syntax error near Left Join on LEFT JOIN image i ?
John
Could you please provide your exact table definitions, so that I would be able to check syntax?
Quassnoi
@John: meanwhile, try now, I messed the clause order indeed :)
Quassnoi
Parts Table Col is just id not parts_id so I changed this line to AND pc.parts_id = p.id, now its working. Changing indexes takes a while so ill be back soon.
John
Explain | PRIMARY | p | ALL | 520,000 | Using where; Using filesort. Query is slower however - seems to be scanning all rows - before it was only scanning the amount equal to categories returned?
John
@John: I *really* need the table definitions to make other suggestions.
Quassnoi
I cannot stress the importance of avoiding DEPENDENT SUBQUERY types. **Please** read the MySQL documentation on EXPLAIN. There are cases when they can perform faster than alternatives, but they are uncommon. You will need to profile both using the DEP SUB and just JOINs
hobodave
@hobodave: believe me, I've read much of `MySQL` documentation, including that on `EXPLAIN`. With the table structures and data I provided, the query takes `30 ms`. You are free to use the table creation scripts I provided and check the performance yourself. `DEPENDENT SUBQUERIES` here will be run but `15` times, each using a single `INDEX` scan on a `PRIMARY KEY`. Don't see why they should be avoided.
Quassnoi
Sorry this site would not let me add comments so i had to reload browser. So much for Ajax! Im using MYISAM tables if that helps, ill post parts table in a tick
John
What is t_source Quassnoi?
hobodave
Agree , the DEPENDENT SUBQUERIES have almost no effect on the speed in my testing today.
John
The problem is with warehouse (its a name) and Updated (date) If i remove those from order by query runs very fast - no file sort either.I guess i could insert a flag for warehouse, so if there is a location specified i could make another col like has_warehouse_location tiny int 1 to sort on but i dont know about the date.
John
If it helps i dont need to order by the warehouse name , but rather wanted to put listings with a warehouse location before listings without if that makes sense.
John
@hobodave: it's a dummy table with just `1,000,000` `id`'s from `1` to `1,000,000`. I keep it around to populate other tables. I just forgot that not everyone has it, sorry :) In almost all of posts in my blog on `MySQL`, like this one: http://explainextended.com/2009/07/07/selecting-last-forum-posts, I provide a script to create a similar table, `filler`. You can use this script instead.
Quassnoi
Quassnoi, you are correct. Indexes where out of order. Thank you very much for your answer once again. Query runs like a rocket even on my windows/xammp dev server. On linux server it returns 10,000 records in 0.002 seconds, perfect.
John
To further improve the performance, and to take some load of the server I placed a limit on the union of 5000 which is another way you can speed up the query if you don't think that retuning all the pages is required. In reality, not many people will go through 330 pages anyway.
John
Is there any badge for an accepted answer with a negative score?
Quassnoi
Quassnoi is this something i can help you with? I didn't vote that down as im not registered.
John
@John: no, it's ok, pure curiosity. On this site, there are "badges" awarded for some peculiar things, like when your answer is accepted with 10 votes but there is another answer with 20 votes. I wondered if there is a badge for the situation like this.
Quassnoi
Ok, well you have my vote :) your are great, BTW: do you do any external freelance work? I hope im allowed to ask such a question!
John
@John: sure. I have a contact form on my site: http://explainextended.com/ask-a-question/, drop me a line.
Quassnoi
will do cheers.
John
Quassnoi I have discovered a slight glitch whereby if the sub category (Parts_Category category_id) only contains a small number like 6 the query takes 37 seconds. Checking the Handler it shows Handler_read_key at nearly 1 million and Handler_read_next at .5 million. It appears that all the rows in the Parts Table are being read regardless of the number of category_id?
John
@John: your query reads the `parts` in index order, checks each row of `parts` against the `parts_category`. If there are but few rows in `parts` that satisfy the `EXISTS` condition and they happen to be closer to the end of the table (in index order), then yes, almost whole table will be read before first `15` values can be returned.
Quassnoi
@John: in fact, you have an `ORDER BY` condition and a `RANGE` condition that cannot be satisfied by a single index. This is a classical problem. You can read more about it in this article in my blog: http://explainextended.com/2009/04/01/choosing-index/ One possible solution for this is that you can collect statistics for your parts_category table and build your query dynamically, taking these statistics into account.
Quassnoi
Quassnoi "your query reads the parts in index order, checks each row of parts against the parts_category." Would it not seem more logical to do it the other way round and select the required ids in the Part Table from the category table?
John
@John: yes, but you will need to sort then, which is long if you have lots of values. In this case, you cannot use a single index both for sorting and for filtering. This is the problem I mentioned in the previous comment. You need to choose: either to use an index for sorting and filter the sorted results, or use an index for filtering and sort the filtered results. If you have lots of results, it's better to sort then filter, otherwise, it's better to filter then sort.
Quassnoi
You need to collect statistics over your `parts_category` (probably daily using a `cron` script), and for categories that have few parts in them you should run another query (which filters then sorts). `SQL Server` and `Oracle` keep the index histograms and try do to this automatically, but in `MySQL` you should do in manually.
Quassnoi
I already have a cron that counts active listings by category and those values are stored on the Category table. So, if im understanding correctly you mean do something like if total count is > xxx this $sql else this $sql. Seems like it can be done.
John
@John: Yes. This actually can be done with a single query, but two queries are also OK.
Quassnoi
@John: sure. Please post it as another question, or contact me via the site.
Quassnoi
+1  A: 

John, the problem is your query is constructed such that it is selecting from a derived table. The derived table cannot benefit from your indices. Try updating your query as follows:

SELECT
  Parts.*,
  Image.type, Image.width, Image.height,
  Location_State.name AS state,
  Location_Region.name AS region,
  Promotion.start_date AS promotion_start_date,
  Promotion.end_date AS promotion_end_date
FROM Parts
LEFT JOIN Image ON Parts.image_id = Image.id
LEFT JOIN Location_State ON Parts.state_id = Location_State.id
LEFT JOIN Location_Region ON Parts.state_id = Location_Region.id
LEFT JOIN Promotion ON Parts.promotion_id = Promotion.id
INNER JOIN Parts_Category ON (Parts_Category.category_id = 40 OR Parts_Category.main_category_id = 40)
WHERE Parts.status = 'A'
GROUP BY Parts.id
ORDER BY Parts.level DESC, Parts.warehouse DESC, Parts.updated DESC LIMIT 0, 15

Note, if you don't need LEFT JOIN's for your Location_State, Location_Region, Promotion tables, then use an INNER JOIN instead. It will likely perform better.

To further assist in optimizing this query, please provide the following:

SHOW CREATE TABLE Parts;

If the rewritten query I provided works the same as your example (it should), then also provide:

EXPLAIN <my query here>\G
hobodave
`DESC` clause is ignored by `MySQL`. All indexed values are stored in ascending order.
Quassnoi
Seems you are correct. I never noticed that portion of the MySQL documentation. Thanks Quassnoi.
hobodave
Thanks hobodave, If i use OR for category i would need to then add Group By clause on parts id because the part may be listed in multiple categories so we dont want them showing multiple times in the same results. Group By is evil whne mixed with order by hence the reason we used UNION. Not Union ALL
John
You are correct with the GROUP BY, but can you provide any substance to your statement that "GROUP BY is evil"? The query you presented is inefficient, and highly unlikely to perform better than what I suggest (with an added GROUP BY). Using EXPLAIN on your query will show a terrible mess of DEPENDENT SUBQUERY, derived tables, etc.
hobodave
Hobodave, with this i get a Sytnax error somewhere near * on second line.
John
John, oops! I removed the ()'s from the SELECT, they should not have been there.
hobodave
hobodave, I ran the query , I had to terminate MySQL server after 5 mins. It locked up my sever.
John