tags:

views:

214

answers:

4

I have m:n relationship between users and tags. One user can have m tags, and one tag can belong to n users. Tables look something like this:

USER:
ID
USER_NAME

USER_HAS_TAG:
USER_ID
TAG_ID

TAG:
ID
TAG_NAME

Let's say that I need to select all users, who have tags "apple", "orange" AND "banana". What would be the most effective way to accomplish this using SQL (MySQL DB)?

+3  A: 
SELECT  u.*
FROM    (
        SELECT  user_id
        FROM    tag t
        JOIN    user_has_tag uht
        ON      uht.tag_id = t.id
        WHERE   tag_name IN ('apple', 'orange', 'banana')
        GROUP BY
                user_id
        HAVING  COUNT(*) = 3
        ) q
JOIN    user u
ON      u.id = q.user_id

By removing HAVING COUNT(*), you get OR instead of AND (though it will not be the most efficient way)

By replacing 3 with 2, you get users that have exactly two of three tags defined.

By replacing = 3 with >= 2, you get users that have at least two of three tags defined.

Quassnoi
that is for sure not most efficient as will aggregate all records. E.g. if no users match criteria, much useless work will be done3 selfjoin is the efficient way to go
noonex
`@noonex`: on a realworld data (lots of users, lots of tags, high user-tag cardinality) this is an efficient way. `tag_name IN (...)` is a sargable condition, it will aggregate only the records with the mathing tags. And what if you need to make the query match `4` or `20` tags? With self-joins, you will need to rewrite the query structure, with `GROUP BY` only the parameters.
Quassnoi
A: 
SELECT *
FROM USER u
INNER JOIN USER_HAS_TAG uht
ON u.id = uht.user_id
INNER JOIN TAG t
ON uht.TAG_ID = t.ID
WHERE t.TAG_NAME IN ('apple','orange','banana')
SiC
This does not work
tputkonen
+2  A: 

You can do it all with joins...

select u.*
from user u

inner join user_has_tag ut1 on u.id = ut1.user_id 
inner join tag t1 on ut1.tag_id = t1.id and t1.tag_name = 'apple'

inner join user_has_tag ut2 on u.id = ut2.user_id 
inner join tag t2 on ut2.tag_id = t2.id and t2.tag_name = 'orange'

inner join user_has_tag ut3 on u.id = ut3.user_id 
inner join tag t3 on ut3.tag_id = t3.id and t3.tag_name = 'banana'
Adam
technically more efficient way will be using appropriate tag_id and selfjoin only user_has_tag table (3 times). But approach is correct
noonex
+1  A: 

In addition to the other good answers, it's also possible to check the condition in a WHERE clause:

select *
from user u
where 3 = (
    select count(distinct t.id)
    from user_has_tag uht
    inner join tag t on t.id = uht.tag_id
    where t.name in ('apple', 'orange', 'banana') 
    and uht.user_id = u.userid
)

The count(distinct ...) makes sure a tag is counted only once, even if the user has multiple 'banana' tags.

By the way, the site fruitoverflow.com is not yet registered :)

Andomar