views:

111

answers:

3

I am writing a a report and was wondering if there is there any way to find items that join into identical sets?

For example in the following query, I want to find all areas that join against the same set of products:

SELECT Area.Name, AggregateSetOfProductsId
FROM Area
INNER JOIN AreaToProduct ON AreaToProduct.AreaId = Area.Id
GROUP BY Area.Id

The current best solution that I could think of was writing a custom aggregate that hashed the ProductId uniqueidentifiers together (something like shift right based on some byte then xor them together) it but I figured I'd ask if there was an easier way before going through it.

A: 

Perhaps use the EXCEPT operator? If the EXCEPT between the two sets is empty, the sets are identical. See http://msdn.microsoft.com/en-us/library/ms188055.aspx Also, to hash a resultset you can use built in aggregates like CHECKSUM_AGG(BINARY_CHECKSUM(*)), see http://msdn.microsoft.com/en-us/library/ms188920.aspx

Remus Rusanu
Ah, I didn't realize CHECKSUM_AGG worked. Thank you.
Chris
+1  A: 

You can easily do this using a Common Table Expression. You would employ FOR XML PATH to create an array, and then use that array to identify matching products. Like this:

;WITH AreaProductSets
AS
(
SELECT a.AreaID, a.Name, 
       SUBSTRING((SELECT (', ' + CAST(ap.ProductID AS varchar(20)))
            FROM AreaToProduct ap
            WHERE ap.AreaID = a.AreaID
            ORDER BY ap.ProductID
            FOR XML PATH ('')), 3, 2000) AS ProductArray
FROM     Area AS a
)
SELECT  SUBSTRING((SELECT  (', ' + CAST(aps2.Name AS varchar(20)))
            FROM   AreaProductSets aps2
            WHERE  aps2.ProductArray = aps.ProductArray
            ORDER BY aps2.AreaID
            FOR XML PATH('')), 3, 2000) AS MatchingAreas,
       aps.ProductArray
FROM     (SELECT DISTINCT ProductArray FROM AreaProductSets) AS aps
Aaron Alton
+1  A: 

You can do this with set operations. Here I show you how to obtain all orderid's from northwind which contain exactly the same products (which is what you're after, right? the areas with exactly the same products)

select distinct o1.orderid 
from orders o1 inner join orders o2 on o1.orderid!=o2.orderid
where not exists
    (
     select * from [order details] od2
     where od2.orderId=o2.Orderid
      and od2.ProductId NOT IN 
      (
       select productid from [order details] od1
       where od1.OrderId=o1.OrderId
      )
    )
    and not exists
    (
     select * from [order details] od1
     where od1.orderId=o1.Orderid
      and od1.ProductId NOT IN 
      (
       select productid from [order details] od2
       where od2.OrderId=o2.OrderId
      )
    )

The idea is basicly this: return order o1 for which an order o2 exists for which there aren't any products not in the list of products of o1 and where there aren't any products from o1 not in the list of products of o2.

Frans Bouma