views:

22

answers:

1

Hi, I'm having trouble working out a query. I've tried subqueries, different joins and group_concat() but they either don't work or are painfully slow. This may be a bit complicated to explain, but here's the problem:

I have a table "item" (with about 2000 products). I have a table "tag" (which contains about 2000 different product tags). And I have a table "tagassign" (which connects the tags to the items, with about 200000 records).

I'm using the tag to define characteristics of the products, for example colour, compatibility, whether the product is on special offer etc. Now if I want to be able to show the products that have a certain tag assigned to them, I use a simple query like:

select * from item, tagassign 
  where item.itemid = tagassign.itemid
  and tagassign.tagid = "specialoffer"

The problem is, that I may want to see items that have several tags. For example I might want to see only the black cell phone cases that are compatible with the Apple iPhone and are new. So I basically want to see all records from the item table, that have tags "black" and "case" and "iphone" and "new". The only way I can get this to work is to create 4 aliases (select * from item, tagassign, tagassign as t1, tagassign as t2, tagassign as t3 etc.). In some cases I might be looking for 10 or 20 different tags, and with that many records the queries are dreadfully slow.

I know I'm missing something obvious. Any ideas? Thanks!

+2  A: 
SELECT  *
FROM    item i
WHERE   (
        SELECT  COUNT(*)
        FROM    tagassign ta
        WHERE   ta.tagid IN ('black', 'case', 'iphone', 'new')
                AND ta.itemid = i.itemid
        ) = 4

Substitute the actual number of the tags you are searching for instead of 4.

Create a unique index or a primary key on tagassign (itemid, tagid) (in this order) for this to work fast.

If you are searching for lots of tags (or for tags that are used rarely), this query may also be faster:

SELECT  i.*
FROM    (
        SELECT  itemid
        FROM    tagassign ta
        WHERE   ta.tagid IN ('black', 'case', 'iphone', 'new')
        GROUP BY
                itemid
        HAVING  COUNT(*) = 4
        ) t
JOIN    item i
ON      i.itemid = t.itemid

For this query, you would need a unique index on tagassign (tagid, itemid).

Quassnoi
Curious about the syntax choice - why not use EXISTS and specify `HAVING COUNT(DISTINCT ta.tagid) = 4`? It wasn't there when I was commenting, I swear...
OMG Ponies
@OMGPonies: Since the index is unique, the `COUNT(*)` will only count the matching records, each one once. `EXISTS` would be an overkill there. You can provide an answer with it though, just for completeness, I'll upvote it :)
Quassnoi
+1: I missed the unique constraint, distracted by the syntax in the first option
OMG Ponies
THANKS Quassnoi for the perfect answer - it works exactly as I want, and is fast and simple. Thank you!
Matt