views:

195

answers:

1

I’m working in SQL Server with the following sample problem. Brandon prefers PC’s and Macs, Sue prefers PC’s only, and Alan Prefers Macs. The data would be represented something like this. I've had to make some compromises here but hopefully you get the picture:

TABLE 1: User

uID (INT PK), uName (VARCHAR)
1        'Brandon'
2        'Sue'
3        'Alan'

TABLE 2: Computer

cID (INT PK), cName (VARCHAR)
1        'Mac'
2        'PC'

TABLE 3: UCPref --Stores the computer preferences for each user

uID (INT FK), cID (INT FK)
1        1
1        2
2        1
3        2

Now, if I want to select everyone who likes PC’s OR Macs that would be quite easy. There's a dozen ways to do it, but if I'm having a list of items fed in, then the IN clause is quite straight-forward:

SELECT u.uName
FROM User u
INNER JOIN UCPref p ON u.uID = p.uID
WHERE cID IN (1,2)

The problem I have is, what happens when I ONLY want to select people who like BOTH PC’s AND Mac’s? I can do it in multiple sub queries, however that isn’t very scalable.

SELECT u.uName
FROM User u
INNER JOIN UCPref p ON u.uID = p.uID
WHERE u.uID IN (SELECT uID FROM UCPref WHERE cID = 1)
AND u.uID IN (SELECT uID FROM UCPref WHERE cID = 2)

How does one write this query such that you can return the users who prefer multiple computers taking into consideration that there may be hundreds, maybe thousands of different kinds of computers (meaning no sub queries)? If only you could modify the IN clause to have a key word like 'ALL' to indicate that you want to match only those records that have all of the items in the parenthesis?

SELECT u.uName
FROM User u
INNER JOIN UCPref p ON u.uID = p.uID
WHERE cID IN *ALL* (1,2)
A: 

Using JOINs:

SELECT u.uname
  FROM USERS u
  JOIN UCPREF ucp ON ucp.uid = u.uid
  JOIN COMPUTER mac ON mac.cid = ucp.cid
                   AND mac.cname = 'Mac'
  JOIN COMPUTER pc ON pc.cid = ucp.cid
                  AND pc.cname = 'PC'

I'm using table aliases, because I'm JOINing onto the same table twice.

Using EXISTS:

SELECT u.uname
  FROM USERS u
  JOIN UCPREF ucp ON ucp.uid = u.uid
 WHERE EXISTS (SELECT NULL
                 FROM COMPUTER c
                WHERE c.cid = ucp.cid
                  AND c.cid IN (1, 2)
             GROUP BY c.cid 
               HAVING COUNT(*) = 2)

If you're going to use the IN clause, you have to use GROUP BY/HAVING but there is a risk in the COUNT(). Some db's don't allow more than the *, while MySQL allows DISTINCT .... The problem is that if you can't use DISTINCT in the count, you could get two instances of the value 2, and it would valid to SQL - giving you a false positive.

OMG Ponies
Thanks for the quick response! Your first example is similar to mine, it works for two items but is not scalable. You could always build the string in a SPROC then exec it but I try and stay away from that. The GROUP BY was my first guess as it's scalable to the extent that it doesn't require SQL code to be rewritten depending on the number of items being checked, however I do have some performance concerns with the GROUP BY clause for large sets of data, however if this is the best approach, maybe this isn't such a problem assuming that any GROUP BY columns are properly indexed?
Brandon