views:

23

answers:

3

hi guys, i'm trying to get optimize an very old query that i can't wrap my head around. the result that i want to archive is that i want to recommend the visitor on a web shop what other customers have shown interest in, i.e. what else they have bought together with the product that the visitor is looking at.

i have a subquery but it's very slow, takes ~15s on ~8 000 000 rows.

the layout is that all products that are put in a users basket are kept in a table wsBasket and separated by a basketid (which in another table is associated with a member).

in this example i want to list all the most popular products that users have bought together with productid 427, but not list the productid 427 itself.

SELECT productid, SUM(quantity) AS qty 
FROM wsBasket 
WHERE basketid IN 
    (SELECT basketid 
     FROM wsBasket 
     WHERE productid=427) AND productid!=427 
GROUP by productid 
ORDER BY qty 
DESC LIMIT 0,4;

any help is much appreciated! hope this makes any sense at all to at least someone :)

 

UPDATE 1: thanks for your comments guys here are my answers, they didn't fit in the comments-field.

Using EXPLAIN on the above query i got the fllowing. Please note, I do not have any indexes on the table (except for primary key on the id-field), i want to modify the query to benefit from indexes and place indexes on the right keys.

+----+--------------------+----------+------+---------------+------+---------+------+------+----------------------------------------------+
| id | select_type        | table    | type | possible_keys | key  | key_len | ref  | rows | Extra                                        |
+----+--------------------+----------+------+---------------+------+---------+------+------+----------------------------------------------+
|  1 | PRIMARY            | wsBasket | ALL  | NULL          | NULL | NULL    | NULL | 2821 | Using where; Using temporary; Using filesort |
|  2 | DEPENDENT SUBQUERY | wsBasket | ALL  | NULL          | NULL | NULL    | NULL | 2821 | Using where                                  |
+----+--------------------+----------+------+---------------+------+---------+------+------+----------------------------------------------+
+1  A: 

Two obvious indexes to add: one on basketid and a second on productid: then retry the query and a new EXPLAIN to see that the indexes are being used

Mark Baker
that did the trick, ~0.124 for a query now! :) and i was looking into combined indexes and modifing the query, well. thanks! are there any way to avoid the subquery?
johan
Combined/Composite indexes may be useful for other queries against this table, but only you know what other types of queries are executed so we can't help with that without further information... but I suspect those two simple indexes will help with other areas of the application code as well
Mark Baker
See bobince's response for modifying the query
Mark Baker
A: 

The two fields you mainly use for searching in this query are productid and basketid.

When you search for records having productid equal to 427, Database has no clue where to find this record. It doesn't even know that if it does find one matching, that there will not be another matching one, so it has to look through the entire table, potentially thousands of records.

An index is a separate file that is sorted, and contains only the field/s you're interested in sorting on. so creating an index saves a immense amount of time!

Prav
+1  A: 

As well as ensuring that suitable indexes exist on productid and basketid, you will often benefit from structuring your query as a simple join rather than a subquery, especially in MySQL.

SELECT b1.productid, SUM(b1.quantity) AS qty
FROM wsBasket AS b0
JOIN wsBasket AS b1 ON b1.basketid=b0.basketid
WHERE b0.productid=427 AND b1.productid<>427
GROUP BY b1.productid
ORDER BY qty DESC
LIMIT 4

For me, on a possibly-similar dataset, the join resulted in two select_type: SIMPLE rows in the EXPLAIN output, whereas the subquery method spat out a horrible-for-performance DEPENDENT SUBQUERY. Consequently the join was well over an order of magnitude faster.

bobince
+1 for the join query
Mark Baker