views:

101

answers:

3

Hello!

Here's my problem. I have a many-to-many table called 'user_has_personalities'. In my application, users can have many personalities, and a personality can belong to many users.

The table has two integer columns, user_id and personality_id.

What I need to do is get all users that have at least all of the personalities (a set of personality_ids of variable size) which I supply to the query.

For an example, I'd like to get all users that have personalities with ids 4, 5, 7, but can also have some other personalities. But I need the query to work for a variable number of wanted personality ids, like 4, 5, 7, 9, 10 for an example.

Any ideas?

+5  A: 

This query does the job:

select  user_id
from    user_has_personalities
where   personality_id in (<list-of-personality-ids>)
group by user_id
having count(*) = <numer-of-items-in-IN-list>

You need to supply a comma-separated list of personality ids for <list-of-personality-ids> and you also need to provide the number of items in th elist. Sticking to your example, you would get:

select  user_id
from    user_has_personalities
where   personality_id in (4,5,7)
group by user_id
having count(*) = 3

this ensures you only get users that have all these personalities.

Roland Bouman
this just tells if they have one of the personality_id's. The question explicitly says ALL of the ids.
Evan Carroll
@Evan: this query tells if the user has all of the ids.
Quassnoi
I see what you're doing. Nice.
Evan Carroll
+1 Yep, this is how to do it.
David Oneill
Thanks! It works great.
Martin
+4  A: 
SELECT  *
FROM    (
        SELECT  DISTINCT user_id
        FROM    user_has_personalities
        ) uhpo
WHERE   EXISTS
        (
        SELECT  NULL
        FROM    user_has_personalities uhpi
        WHERE   uhpi.user_id  = uhpo.user_id
                AND personality_id IN (4, 5, 6, 9, 10)
        LIMIT 1 OFFSET 4
        )

Offset value should be 1 less than the number of items in the IN list.

If you have your personality list in a dedicated table, use this:

SELECT  *
FROM    (
        SELECT  DISTINCT user_id
        FROM    user_has_personalities
        ) uhpo
WHERE   (
        SELECT  COUNT(*)
        FROM    perslist p
        JOIN    user_has_personalities uhpi
        ON      uhpi.user_id = uhpo.user_id
                AND uhpi.personality_id = p.id
        ) =
        (
        SELECT  COUNT(*)
        FROM    perslist
        )

For this to work correctly (and fast), you need to have a UNIQUE index on user_has_personalities (user_id, personality_id) (in this order).

If you have a users table and almost all users have a record in user_has_personalities, then substitute it in place of the DISTINCT nested query:

SELECT  user_id
FROM    users uhpo
WHERE   (
        SELECT  COUNT(*)
        FROM    perslist p
        JOIN    user_has_personalities uhpi
        ON      uhpi.user_id = uhpo.user_id
                AND uhpi.personality_id = p.id
        ) =
        (
        SELECT  COUNT(*)
        FROM    perslist
        )
Quassnoi
+1 for general mastery and wizard-ness :) Quassnoi, I love your blog and I linked you. I would like your blog even more if you'd open comments on it, but obviously its your prerogative to leave it as is. Anyway, cheers and keep it up :)
Roland Bouman
@Roland: every article in category `Miscellaneous` has comments enabled.
Quassnoi
@Quassnoi: ok, fair enough :) thanks.
Roland Bouman
+1  A: 
SELECT a.user_id
FROM user_has_personalities a
JOIN user_has_personalities b ON a.user_id = b.user_id AND b.personality_id = 5
JOIN user_has_personalities c ON a.user_id = c.user_id AND b.personality_id = 7
WHERE a.personality_id = 4

It would be easy enough to generate this list programatically, but it's not exactly as easy as supplying a set. On the other hand, it is efficient.

eswald