I know I'm late to the party, but I'm working on learning SQL and I wanted to try my hand at a solution and compare against the existing answers. I created a table Personnel
with some testing data.
My SQL Server only query uses CTEs and an INNER JOIN
:
WITH
Finance AS (SELECT SSN, Name FROM Personnel WHERE Org = 'Finance'),
IT AS (SELECT SSN, Name FROM Personnel WHERE Org = 'IT')
SELECT Finance.SSN, Finance.Name
FROM Finance
INNER JOIN IT ON IT.SSN = Finance.SSN
WHERE IT.Name != Finance.Name
Alexander's solution uses a straight INNER JOIN
. I rewrote it a little bit, putting the name comparison in the WHERE
clause, and dropping DISTINCT
because it's not required:
SELECT Finance.SSN, Finance.Name
FROM Personnel Finance
INNER JOIN Personnel IT ON Finance.SSN = IT.SSN
WHERE
(Finance.Org = 'Finance' AND IT.Org = 'IT') AND
(Finance.Name != IT.Name)
Andomar's solution using a correlated subquery inside an EXISTS
clause:
SELECT SSN, Name
FROM Personnel a
WHERE
(Org = 'Finance') AND
EXISTS
(
SELECT *
FROM Personnel b
WHERE (Org = 'IT') AND (a.SSN = b.SSN) AND (a.Name != b.Name)
)
barrylloyd's solution using a correlated subquery inside an IN
clause:
SELECT SSN, Name
FROM Personnel p1
WHERE
(Org = 'Finance') AND
SSN IN
(
SELECT SSN FROM Personnel p2
WHERE (Org = 'IT') AND (p1.Name != p2.Name)
)
I plugged all of these into SQL Server, and it turns out that queries 1 and 2 both generate the same query plan, and queries 3 and 4 generate the same query plan. The difference between the two groups is the former group actually does an INNER JOIN
internally, while the latter group does a left semi-join instead. (See here for an explanation of the different types of joins.)
I'm assuming there is a slight performance advantage favouring the left semi-join; however, for the business case, if you want to see any data columns from the right table (for example, if you want to display both names to compare them), you would have to completely rewrite those queries to use an INNER JOIN
-based solution.
So given all that, I would favour solution 2, because the performance is so similar to 3 and 4, and it's far more flexible than those as well. My solution makes the SELECT
statement very easy to read, but it's more verbose than 2 and not as portable. I suppose that mine might be better for readability if you have to do additional filtering on each of the two "sub-tables," or if the results of this query are going to be used as an intermediate step to a further goal.