views:

82

answers:

2

Currently have an 'item' table, and a 'pair' table. The pair table simply contains two columns, which contain the primary key from the item table.

A common query is to find a number of items that are featured in the least number of pairs.

SELECT id,COUNT(*) AS count FROM item i LEFT JOIN pair p ON (i.id = p.id1 OR i.id = p.id2) GROUP BY id ORDER BY count,RAND() LIMIT 100

but the query is horible performance wise. There is an index on id1,id2 on pair.

+----+-------------+-------+-------+---------------+------+---------+------+-------+---------------------------------+
| id | select_type | table | type  | possible_keys | key  | key_len | ref  | rows  | Extra                           |
+----+-------------+-------+-------+---------------+------+---------+------+-------+---------------------------------+
|  1 | SIMPLE      | item  | ALL   | NULL          | NULL | NULL    | NULL |  5644 | Using temporary; Using filesort |
|  1 | SIMPLE      | pair  | index | id1           | id1  | 8       | NULL | 18377 | Using index                     |
+----+-------------+-------+-------+---------------+------+---------+------+-------+---------------------------------+

Is there a better query, and/or data structure for this type of thing?

A: 

If you have an index on (id1,id2) you should also have an index on id2, for those cases where you are matching against id2 by itself. (you get an index on id1 for free as part of the (id1,id2) index)

Jason S
I tried that, didnt make any difference on the above query tho. (as I understand it using two indexes to satisfy a single where/on clause doesnt work)
barryhunter
+1  A: 

You need to create two indexes on pair:

CREATE INDEX ix_pair_1 ON pair (id1)
CREATE INDEX ix_pair_2 ON pair (id2)

and rewrite your query as this:

SELECT  (
        SELECT  COUNT(*)
        FROM    pair
        WHERE   id1 = i.id
        ) + 
        (
        SELECT  COUNT(*)
        FROM    pair
        WHERE   id2 = i.id
        ) AS cnt
FROM    item i
ORDER BY
        cnt, RAND()
LIMIT 100
Quassnoi
thanks, that works great!!!!`I'd tried it as SELECT id,COUNT(p1.id1)+COUNT(p2.id2) AS count FROM item i LEFT JOIN pair p1 ON (i.id = p1.id1) LEFT JOIN pair p2 (i.id = p2.id2) GROUP BY id ORDER BY count,RAND() LIMIT 100` which worked reasonably well. But done as subqueries is magnitudes faster - guessing the two subqueries can be answerd from the indexes alone.
barryhunter