tags:

views:

380

answers:

3

Hi,

Table 1: Tracks

Table 2: Wordlist

Table 3: N:M Track has Words (trackwords)

Find all tracks which have all the words.

currently the query looks like:

SELECT DISTINCT t.id FROM track as t
Left Join trackwords as tw ON t.id=tw.trackid
Left Join wordlist as wl on wl.id=tw.wordid
WHERE 
wl.trackusecount>0 
group by t.id
HAVING SUM(IF(wl.word IN ('folsom','prison','blues'),1,0)) = 3;

Which according to EXPLAIN is using all indexes neccessary:

+----+-------------+-------+--------+-----------------------+---------+---------+----------------+---------+-------------+
| id | select_type | table | type   | possible_keys         | key     | key_len | ref            | rows    | Extra       |
+----+-------------+-------+--------+-----------------------+---------+---------+----------------+---------+-------------+
|  1 | SIMPLE      | t     | index  | PRIMARY               | PRIMARY | 4       | NULL           | 8194507 | Using index | 
|  1 | SIMPLE      | tw    | ref    | wordid,trackid        | trackid | 4       | mbdb.t.id      |       3 | Using where | 
|  1 | SIMPLE      | wl    | eq_ref | PRIMARY,trackusecount | PRIMARY | 4       | mbdb.tw.wordid |       1 | Using where | 
+----+-------------+-------+--------+-----------------------+---------+---------+----------------+---------+-------------+

But the query takes ages. Any suggestion to speedup the query?

+2  A: 

Your problem set is very much like that of storing tags for an item like StackOverflow or Del.icio.us does.

This link should help point you in the right direction.

http://www.pui.ch/phred/archives/2005/04/tags-database-schemas.html

thinkzig
uhhh very nice link indeed, thanks.
Rufinus
+3  A: 

There's no point in left joins if you're only looking for tracks that have all the words. I'm assuming that (trackid,wordid) combination is unique in trackwords.

SELECT t.id
  FROM track as t,  trackwords as tw, wordlist as wl
 WHERE t.id=tw.trackid
   AND wl.id=tw.wordid
   AND wl.trackusecount>0 /* not sure what that is - you have it in your query */
   AND wl.word in ('folsom','prison','blues')
 GROUP by t.id
HAVING count(*) = 3

This query would benefit from indexes on wordlist(word), trackwords(trackid,wordid) and track(id).

ChssPly76
your query is using a temptable which makes it 3 times slower.but nevermind, your having count() = 3 is a a good idea und brings me to: SELECT DISTINCT t.name, t.id FROM track as t Left Join trackwords as tw ON t.id=tw.trackid Left Join wordlist as wl on wl.id=tw.wordid WHERE wl.trackusecount>0 AND wl.word IN ('folsom','prison','blues') group by t.id HAVING Count(t.id) = 3 LIMIT 0,10;which is nearly 10times faster then my orginal query :)
Rufinus
It should not use a temp table if you have appropriate indexes defined. You'll have 3 rows in wordlist, index lookup on trackwords and index lookup on tracks
ChssPly76
strange, in some cases your version is faster, in some cases mine is faster.(taking different words to test, query cache is strong in this case :-)btw. your solution is the same as mentioned on the link from thinkzig. so i guess its right :-)
Rufinus
explain says so: PRIMARY,word,trackusecount | word | 767 | NULL | 2 | Using where; Using temporary; Using filesort
Rufinus
gave the accepted answer to thinkzig, dont be mad, just giving a newb like me some points :-)
Rufinus
I don't get mad, I get even. Joking, of course :-) thinkzig's article is very detailed. As far as explain you've posted goes - perhaps you're selecting additional columns in that query? Or you don't have the indexes I've mentioned?
ChssPly76
A: 

Would probably be faster if you broke this up into two queries. First, a join of the words and trackwords to net you all the trackids you need. Then go back to the track table and do:

WHERE t.id IN(...trackids here...)

but based on the query above all you're returning is t.id which you have from tw.trackid already.

Swish
theoreticly right, but i shortend the SELECT part of the query to make it easier to understand :)i tried this with seperated queries, and sub selects, all versions were slower :(
Rufinus