tags:

views:

180

answers:

4

Given the following two tables in SQL Server 2005:

IF EXISTS (SELECT * FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_NAME = 'GroupItems')
    DROP TABLE GroupItems;
CREATE TABLE GroupItems (
    RowID INT IDENTITY(1,1) PRIMARY KEY
    , GroupID CHAR(1)
    , ItemID INT
);

IF EXISTS (SELECT * FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_NAME = 'ItemList')
    DROP TABLE ItemList;
CREATE TABLE ItemList (
    ItemID INT PRIMARY KEY
)


INSERT GroupItems ( GroupID, ItemID )
SELECT 'A', 1
UNION SELECT 'A', 2
UNION SELECT 'A', 3
UNION SELECT 'A', 4
UNION SELECT 'B', 1
UNION SELECT 'B', 2
UNION SELECT 'B', 4
UNION SELECT 'C', 1
UNION SELECT 'C', 2
UNION SELECT 'D', 1
UNION SELECT 'D', 4
UNION SELECT 'D', 5


INSERT ItemList ( ItemID )
SELECT 1
UNION SELECT 2
UNION SELECT 4

I'm trying to find the GroupID(s) from table GroupItems where the ItemIDs are an exact match for the contents of table ItemList.

In the sample data, the result should be Group 'B'.

Group A is rejected because it contains an item that is not in the ItemList table.

Group C is rejected because it doesn't contain all the items in the ItemList table.

Group D is rejected for both reasons.


Currently, I'm doing something like

DECLARE @ListCount INT;
SELECT @ListCount = COUNT(*) FROM ItemList;

SELECT GI.GroupID FROM GroupItems AS GI
INNER JOIN ItemList AS IL ON IL.ItemID = GI.ItemID
INNER JOIN ( SELECT GroupID FROM GroupItems 
             GROUP BY GroupID
             HAVING COUNT(*) = @ListCOunt ) AS GS ON GS.GroupID = GI.GroupID  
GROUP BY GI.GroupID 
HAVING COUNT(*) = @ListCount;

This function gives the correct result that I'm looking for, however, in my production environment, the GroupItems table has hundreds of thousands of rows and thousands of unique GroupIDs. The ItemList table usually contains about a dozen rows. This function is called fairly regularly. I'm looking for a more efficient way to get the same results.

A: 

Assuming:

  • there is no relevant information missing
  • ItemIDs are PKs, and therefore unique
  • You don't want GroupIDs where you have repeated group/item combinations

This should work:

select GroupID
from GroupItems
inner join ItemMaster
    on GroupItems.ItemID = ItemMaster.ItemID
inner join GroupMaster
    on GroupItems.GroupID = GroupMaster.GroupID
group by GroupID
having count(*) = (select count(*) from ItemList)

If there was a guarantee in GroupItems of unique group/item combinations, the join would be unneccessary.

Harper Shelby
This solution returns both A and B
Irwin M. Fletcher
That one doesn't work. On the sample data, it returns A and B. Only B should be returned. In T-Sql, the COUNT() function requires a parameter (typically *).I don't think I've missed anything relevant. If two groups contain the exact same items as ItemList, they should both be returned.
jsr
I see that now - however, your example appears (to me) to have an inconsistency that causes this. You have an ItemID in GroupItems that doesn't exist in ItemList. Is this expected? It seems to violate the concepts - though you haven't listed them explicitly, if I see an ItemList and a GroupItems table, I would expect the GroupItems table to have a foreign key relationship to ItemList (and to GroupList, for that matter).
Harper Shelby
Sorry for the confusion. Assume that the ItemID column in the GroupItems table and the ItemID in the ItemList table both have a foreign key to a third table "ItemMaster". There is no foreign key relationship between GroupItems and ItemList tables. It is common for GroupItems.ItemID to have values that are not contained in the ItemList table.
jsr
OK - 2 joins (to ItemMaster, and a guessed-at GroupMaster) to ensure that we've got only one entry per Group/Item combo, then match the counts. That ought to work.
Harper Shelby
A: 

Assuming :

  • ItemID value can be only > 0
SELECT t.GroupID
FROM (
  SELECT GroupItems.GroupID
        ,count(1) as groupItemsCount 
        ,min(IsNull(ItemList.ItemID, -1)) as minVal
  FROM GroupItems
      LEFT JOIN ItemList
              ON (GroupItems.ItemID = ItemList.ItemID)
  GROUP BY GroupID
) t
WHERE t.groupItemsCount = (SELECT COUNT(1) FROM ItemList)
  AND (t.minVal > 0)
DzheiZee
ItemID will always be > 0. I hadn't considered using a LEFT JOIN. I'll try it out and see how it performs compared to my original function.
jsr
A: 

Have you considered creating an indexed view to aggregate the counts on GroupItems?

CREATE VIEW GroupCounts (groupId, GroupCount) with SCHEMABINDING
AS
SELECT groupId, COUNT_BIG(1) /* I use 1 instead of asterisk by convention */
FROM GroupItems
GROUP BY groupId

CREATE CLUSTERED INDEX IX_GroupCounts on GroupCounts(groupId)

With this, you can use a similar query to the one you have, but it should have much better performance.

SELECT GS.groupId FROM GroupItems AS GI
INNER JOIN ItemList AS IL ON IL.ItemID = GI.ItemID
INNER JOIN GroupCounts AS GS ON GS.GroupID = GI.GroupID  
GROUP BY GS.GroupID 
HAVING COUNT(1) = groupCount;
Jeff Meatball Yang
A: 

Try:

   SELECT gi.groupid
     FROM GROUPITEMS gi
LEFT JOIN ITEMLIST il ON il.itemid = gi.itemid
     JOIN (SELECT COUNT(*) 'num_items' FROM ITEMLIST) c
 GROUP BY gi.groupid
   HAVING COUNT(*) = c.num_items
OMG Ponies