tags:

views:

46

answers:

5

I have a table People (First_Name, Last_Name). This table has records that are duplicated as in the example (not all rows are duplicated):

First_Name  Last_Name
John        Smith
Alec        Baldwin
Smith       John
Angelo      Gordon
Mary        Bush
Bush        Mary

How do I select all distinct people? In the final output of the query John Smith should appear only once (it’s not import if in the final query there is John Smith or Smith John).

Thank you.

+1  A: 

Sorry was missundertanding your question on the first try...

WITH People (Firstname, Lastname)
AS
(
    SELECT 'John' AS Firstname, 'Smith' AS Lastname UNION
    SELECT 'John' AS Firstname, 'Smith' AS Lastname UNION
    SELECT 'Alec' AS Firstname, 'Baldwin' AS Lastname UNION
    SELECT 'Smith' AS Firstname, 'John' AS Lastname UNION
    SELECT 'John' AS Firstname, 'Smith' AS Lastname UNION
    SELECT 'Angelo' AS Firstname, 'Gordon' AS Lastname UNION
    SELECT 'Mary' AS Firstname, 'Bush' AS Lastname UNION
    SELECT 'Bush' AS Firstname, 'Mary' AS Lastname
)
SELECT p1.* FROM People p1 
LEFT OUTER JOIN People p2 ON p2.Firstname = p1.Lastname AND p2.Lastname = p1.Firstname AND p2.Firstname < p1.Firstname 
WHERE p2.Firstname IS NULL  
Yves M.
Are you ignoring that 'Mary Bush' should be grouped with 'Bush Mary'?
Tobiasopdenbrouw
Yes i think he is...
Ben Robinson
No. the first_name and last_name attributes need to louse their logic. In the final answer there must be a John Smith record for both John Smith and Smith John initial records.
Mihai Om
Sorry my fault. Missunderstood the question completely.
Yves M.
+4  A: 

Just pick an ordering and apply it across everyone. Then use a union that will eliminate duplicates anyway

select FirstName,LastName from People where FirstName <= LastName
union
select LastName,FirstName from People where LastName < FirstName
Damien_The_Unbeliever
Nice one. Would you use this version in RL over the answer I posted? (And why?)
Tobiasopdenbrouw
This is it. Thank you!
Mihai Om
@Tobias - Well, it was the first answer I thought of, and I didn't see anyone else posting similar. I'd note that yours does depend on a special character (" ") not appearing within these columns, which may or may not be relevant. Depending on the size of the table, and the intelligence of the optimizer, this may lead to two scans of the table, but I think yours has the same problems going for it - that subquery is going to scan the whole table I think.
Damien_The_Unbeliever
Clean idea. Thumb up! Maybe it can have performance problems but I think other solutions would have bigger troubles with that, too.
Petr Kozelek
Thanks for both the answer and the clarification comment.
Tobiasopdenbrouw
+3  A: 

This is one way to do it using pretty much any SQL flavor.

DECLARE @Names TABLE (
  First_Name VARCHAR(32)
  , Last_Name VARCHAR(32)
)  

INSERT INTO @Names VALUES ('John', 'Smith')
INSERT INTO @Names VALUES ('Alec', 'Baldwin')
INSERT INTO @Names VALUES ('Smith', 'John')
INSERT INTO @Names VALUES ('Angelo', 'Gordon')
INSERT INTO @Names VALUES ('Mary', 'Bush')
INSERT INTO @Names VALUES ('Bush', 'Mary')

Using a JOIN

SELECT  n1.*
FROM    @Names n1
        LEFT OUTER JOIN @Names n2 ON n2.First_Name = n1.Last_Name
                                     AND n2.Last_Name = n1.First_Name
                                     AND n2.First_Name < n1.First_Name
WHERE   n2.First_Name IS NULL                                     

or NOT EXISTS

SELECT  n1.*
FROM    @Names n1
WHERE   NOT EXISTS (
          SELECT  *  
          FROM    @Names n2 
          WHERE   n2.First_Name = n1.Last_Name
                  AND n2.Last_Name = n1.First_Name
                  AND n2.First_Name < n1.First_Name
        )
Lieven
A: 

Here's a solution which uses Oracle functions. Other flavours of SQL will have the same or very similar functions:

SQL> select * from t23
  2  /

FIRST_NAME                     LAST_NAME
------------------------------ ------------------------------
John                           Smith
Alec                           Baldwin
Smith                          John
Angelo                         Gordon
Mary                           Bush
Bush                           Mary

6 rows selected.

SQL> select distinct least(first_name, last_name)
  2                  , greatest(first_name, last_name)
  3  from t23
  4  /

LEAST(FIRST_NAME,LAST_NAME)    GREATEST(FIRST_NAME,LAST_NAME)
------------------------------ ------------------------------
Alec                           Baldwin
Bush                           Mary
John                           Smith
Angelo                         Gordon

SQL> 
APC
A: 

I think this might work in MS-SQL

select * from People

where (FirstName + "," + LastName) <> (LastName + "," + FirstName)

Rohit