tags:

views:

75

answers:

4
+2  Q: 

Simple SQL Problem

I have a SQL query that I cant wrap my mind around. I do not have a large amount of sql experience. So I need some help

I have a table XXX:

Social Security No (SSN).
Name.
organisation. (Finance/IT)

In english what I want is:

To select all SSNs and Names in "Finance" where there is a different name for that SSN in "IT".

My not working attempt:

select ssn , name from XXX where org = "Finance" and name not in (select name from XXX where org="IT" and ssn=the_first_ssn)

Please help.


I have decided to make it a bit more difficult.

SSN can ocur multiple times in "IT":

So I want to select all SSNs and Names in Finance where the SSN does not exist with the same Name in "IT"

+3  A: 

You could use a subquery in an exists clause:

select  ssn, name
from    YourTable a
where   organisation = 'Finance'
        and exists
        (
        select  *
        from    YourTable b
        where   organisation = 'IT'
                and a.ssn = b.ssn
                and a.name <> b.name
        )

The subquery says there must be a row in IT with the same SSN but a different name.

Andomar
This is exactly what I was hoping for. It looks like i needed to store the table as "a", then I can rweference the sriginal querys data.Perfect.Thank you.
Mark
+1  A: 

Assuming ssn is a unique key...

select ssn, name 
from XXX XXX1
where org = "Finance" 
and ssn in 
(
select ssn 
from XXX XXX2
where org="IT"
and XXX1.name<>XXX2.name
)
barrylloyd
A: 
SELECT distinct
t1.ssn,
t1.name
from
xxx t1
inner join xxx t2 on t1.ssn=t2.ssn and t1.name<>t2.name
where t1.org='Finance' and t2.org='IT'
Alexander
A: 

I know I'm late to the party, but I'm working on learning SQL and I wanted to try my hand at a solution and compare against the existing answers. I created a table Personnel with some testing data.

  1. My SQL Server only query uses CTEs and an INNER JOIN:

    WITH
        Finance AS (SELECT SSN, Name FROM Personnel WHERE Org = 'Finance'),
        IT AS (SELECT SSN, Name FROM Personnel WHERE Org = 'IT')
    SELECT Finance.SSN, Finance.Name
        FROM Finance
        INNER JOIN IT ON IT.SSN = Finance.SSN
        WHERE IT.Name != Finance.Name
    
  2. Alexander's solution uses a straight INNER JOIN. I rewrote it a little bit, putting the name comparison in the WHERE clause, and dropping DISTINCT because it's not required:

    SELECT Finance.SSN, Finance.Name
        FROM Personnel Finance
        INNER JOIN Personnel IT ON Finance.SSN = IT.SSN
        WHERE
            (Finance.Org = 'Finance' AND IT.Org = 'IT') AND
            (Finance.Name != IT.Name)
    
  3. Andomar's solution using a correlated subquery inside an EXISTS clause:

    SELECT SSN, Name
        FROM Personnel a
        WHERE
            (Org = 'Finance') AND
            EXISTS
            (
                SELECT *
                    FROM Personnel b
                    WHERE (Org = 'IT') AND (a.SSN = b.SSN) AND (a.Name != b.Name)
            )
    
  4. barrylloyd's solution using a correlated subquery inside an IN clause:

    SELECT SSN, Name
        FROM Personnel p1
        WHERE
            (Org = 'Finance') AND
            SSN IN
            (
                SELECT SSN FROM Personnel p2
                    WHERE (Org = 'IT') AND (p1.Name != p2.Name)
            )
    

I plugged all of these into SQL Server, and it turns out that queries 1 and 2 both generate the same query plan, and queries 3 and 4 generate the same query plan. The difference between the two groups is the former group actually does an INNER JOIN internally, while the latter group does a left semi-join instead. (See here for an explanation of the different types of joins.)

I'm assuming there is a slight performance advantage favouring the left semi-join; however, for the business case, if you want to see any data columns from the right table (for example, if you want to display both names to compare them), you would have to completely rewrite those queries to use an INNER JOIN-based solution.

So given all that, I would favour solution 2, because the performance is so similar to 3 and 4, and it's far more flexible than those as well. My solution makes the SELECT statement very easy to read, but it's more verbose than 2 and not as portable. I suppose that mine might be better for readability if you have to do additional filtering on each of the two "sub-tables," or if the results of this query are going to be used as an intermediate step to a further goal.

Jon Seigel