tags:

views:

71

answers:

3

Hi

I have 2 tables: sets and groups. Both are joined using a 3rd table set_has_groups. I would like to get sets that have ALL groups that I specify

One way of doing it would be

SELECT column1, column2 FROM sets WHERE 
id IN(SELECT set_id FROM set_has_group WHERE group_id = 1)
AND id IN(SELECT set_id FROM set_has_group WHERE group_id = 2)
AND id IN(SELECT set_id FROM set_has_group WHERE group_id = 3)

obviously this is not the most beautiful solution

I've also tried this:

SELECT column1, column2 FROM sets WHERE 
id IN(SELECT set_id FROM set_has_group WHERE group_id IN(1,2,3) GROUP BY group_id
HAVING COUNT(*) = 3

This looks prettier but the problem is that it takes forever to execute. While the first query runs in like 200ms the 2nd one takes more than 1 minute.

Any idea why that is?

===UPDATE: I've played with this some more and I modified the 2nd query like this

SELECT columns FROM `set` WHERE id IN(
   select set_id FROM
      (
         SELECT set_id FROM set_has_group 
         WHERE group_id IN(1,2,3)
         GROUP BY set_id HAVING COUNT(*) = 3
      ) as temp      
)

that is really fast It's the same as the 2nd query before just that I wrap it in another temporary table Pretty strange

+1  A: 

I am suspecting a small mistyping in the second query.

Really, I am not sure. Probably, the second query is executed via full table scan. At the same time the first one "IN" is really transformed into "EXISTS". So, you can try to use "exists". For example:

...
where 3 = (select count(*) from set_has_group 
    where group_id in (1, 2, 3) and set_id = id
    group by set_id)
serge_bg
A: 

Assuming SQL Server, here is a working example with a JOIN that should work better than the IN clauses you are using as long as you have your primary and foreign keys set correctly. I have built joined 5 sets to 3 groups, but set 4 and 5 are not a part of group 3 and will not show in the answer. However, this query is not scalable (for ex. find in group 4, 5, 7, 8 and 13 will require code modifications unless you parse input params into a table variable)

set nocount on

declare @sets table
(
Id  INT Identity (1, 1),
Column1 VarChar (50),
Column2 VarChar (50)
)

declare @Set_Has_Group table
(
    Set_Id Int,
    Group_Id Int
)

insert into @sets values (newid(), newid())
insert into @sets values (newid(), newid())
insert into @sets values (newid(), newid())
insert into @sets values (newid(), newid())
insert into @sets values (newid(), newid())

update @sets set column1 = 'Column1 at Row ' + Convert (varchar, id)
update @sets set column2 = 'Column2 at Row ' + Convert (varchar, id)

insert into @Set_Has_Group values (1, 1)
insert into @Set_Has_Group values (1, 2)
insert into @Set_Has_Group values (1, 3)
insert into @Set_Has_Group values (2, 1)
insert into @Set_Has_Group values (2, 2)
insert into @Set_Has_Group values (2, 3)
insert into @Set_Has_Group values (3, 1)
insert into @Set_Has_Group values (3, 2)
insert into @Set_Has_Group values (3, 3)
insert into @Set_Has_Group values (4, 1)
insert into @Set_Has_Group values (4, 2)
insert into @Set_Has_Group values (5, 1)
insert into @Set_Has_Group values (5, 2)

/* your query with IN */
SELECT column1, column2 FROM @sets WHERE 
id IN(SELECT set_id FROM @set_has_group WHERE group_id = 1)
AND id IN(SELECT set_id FROM @set_has_group WHERE group_id = 2)
AND id IN(SELECT set_id FROM @set_has_group WHERE group_id = 3)

/* my query with JOIN */
SELECT * -- Column1, Column2
FROM    @sets sets
WHERE 3 = (
    SELECT Count (1)
    FROM @Set_Has_Group Set_Has_Group
    WHERE 1=1
     AND sets.Id = Set_Has_Group.Set_Id
     AND Set_Has_Group.Group_ID IN (1, 2, 3)
    Group by Set_Id
    )
Raj More
A: 

Here's a solution that uses a non-correlated subquery and no GROUP BY:

SELECT column1, column2 
FROM sets 
WHERE id IN (
  SELECT g1.set_id FROM set_has_group g1
  JOIN set_has_group g2 ON (g1.set_id = g3.set_id)
  JOIN set_has_group g3 ON (g1.set_id = g3.set_id)
  WHERE g1.group_id = 1 AND g2.group_id = 2 AND g3.group_id = 3);
Bill Karwin