views:

360

answers:

2

Hello,

First - apologies for the fuzzy title, I could not find a better one.

I have table with the following structure (simplification):

EmpID DeptID

1     1
1     2
2     1
3     2
4     5
5     2

This table represents a many-to-many relationship.

I'm interested in finding all the EmpIDs that are related to a specific group of DeptIDs, for example I want all the EmpIDs that are related to DeptIDs 1, 2 and 3. Please note it's an AND relationship and not an OR relationship. For my case, the EmpID may be related to additional DeptIDs besides 1, 2 and 3 for it to be a valid answer.

The number of DeptIDs I'm interested in changes (i.e. I may want EmpIDs who're related to both DeptID 3 and 5, or I may want EmpIDs related to DepIDs 2, 3, 4, 5, 6, 7).

When I try to approach this problem I find myself either creating a JOIN per DepID, or a subquery per DeptID. This would mean I have to generate a new query per the number of DeptIDs I'm testing against. I would obviously prefer having a static query with a parameter or set of parameters.

I'm working over both SQL Server and MySQL (developing in parallel two versions of my code).

Any ideas?

+4  A: 

I'm assuming you want to find employees that are in ALL of the specified departments and not just the employees that are in ANY of the departments, which is a far easier query.

SELECT EmpID
FROM mytable t1
JOIN mytable t2 ON t1.EmpID = t2.EmpID AND t2.DeptID = 2
JOIN mytable t3 ON t2.EmpID = t3.EmpID AND t3.DeptID = 3
WHERE DeptID = 1

I'm going to preempt the inevitable suggestion that'll come to use aggregation:

SELECT EmpID
FROM mytable
WHERE DeptID IN (1,2,3)
GROUP BY EmpID
HAVING COUNT(1) = 3

Resist that temptation. It's significantly slower. A similar scenario to this came up in SQL Statement - “Join” Vs “Group By and Having” and the second version was, in that second, about twenty times slower.

I'd also suggest you look at Database Development Mistakes Made by AppDevelopers.

cletus
How do you know that twenty times faster is significantly better? Maybe speed doesn't matter all that much.
Walter Mitty
+3  A: 

I'd start from something like:

SELECT EmpID, COUNT(*) AS NumDepts
FROM thetable
WHERE DeptID IN (1, 2, 3)
GROUP BY EmpId
HAVING COUNT(*) == 3

of course, that 3 in the last line would always be the length of the sequence of department ids you're checking (so for (2,3,4,5,6,7) it would be 6). This is one natural way to express "employees connected to all of these departments".

Edit: I see a note in another answer about performance issues -- I've tried this approach in SQLite and PostgreSQL, with appropriate indices, and there it looks like it's performing well and with appropriate use of all said indices; and in MySQL 5.0, where I have to admit performance was nowhere as good.

I suspect (without an opportunity to benchmark this on a zillion more engines;-) that other really good SQL engines (such as SQL Server 2008, Oracle, IBM DB2, the new open-source Ingres...) will also optimize this query well, while other mediocre ones (can't think of any with a popularity anywhere close to MySQL's) won't.

So, no doubt your favorite answer will depend on what engines you really care about (this takes me back to the time, over a decade ago, when my responsibilities included managing the team which maintained a component that was supposed to provide well-performing queries over more than half a dozen disparate engines -- talk about nightmare jobs...!-).

Alex Martelli