views:

5581

answers:

8

I have the below table with the below records in it

create table employee
(
 EmpId number,
 EmpName varchar2(10),
 EmpSSN varchar2(11)
);

insert into employee values(1, 'Jack', '555-55-5555');
insert into employee values (2, 'Joe', '555-56-5555');
insert into employee values (3, 'Fred', '555-57-5555');
insert into employee values (4, 'Mike', '555-58-5555');
insert into employee values (5, 'Cathy', '555-59-5555');
insert into employee values (6, 'Lisa', '555-70-5555');
insert into employee values (1, 'Jack', '555-55-5555');
insert into employee values (4, 'Mike', '555-58-5555');
insert into employee values (5, 'Cathy', '555-59-5555');
insert into employee values (6 ,'Lisa', '555-70-5555');
insert into employee values (5, 'Cathy', '555-59-5555');
insert into employee values (6, 'Lisa', '555-70-5555');

I dont have any primary key in this table .But i have the above records in my table already. I want to remove the duplicate records which has the same value in EmpId and EmpSSN fields.

Ex : Emp id 5

Can any one help me to frame a query to delete those duplicate records

Thanks in advance

+5  A: 

You could create a temporary table #tempemployee containing a select distinct of your employee table. Then delete from employee. Then insert into employee select from #tempemployee.

Like Josh said - even if you know the duplicates, deleting them will be impossile since you cannot actually refer to a specific record if it is an exact duplicate of another record.

Daren Thomas
Only trick there is if the names are different but the ID/SSN match. You'd have to somehow pick one because distinct wouldn't help there.
Josh Einstein
+1 this is the most straightforward and portable solution. OP does not state what brand of database he uses.
Bill Karwin
@Josh: from the OP's sample, it looks like that's not an issue. The duplicate rows are identical in all columns.
Bill Karwin
A: 

I'm not an SQL expert so bear with me. I'm sure you'll get a better answer soon enough. Here's how you can find the duplicate records.

select t1.empid, t1.empssn, count(*)
from employee as t1 
inner join employee as t2 on (t1.empid=t2.empid and t1.empssn = t2.empssn)
group by t1.empid, t1.empssn
having count(*) > 1

Deleting them will be more tricky because there is nothing in the data that you could use in a delete statement to differentiate the duplicates. I suspect the answer will involve row_number() or adding an identity column.

Josh Einstein
+13  A: 

Add a Primary Key (code below)

Run the correct delete (code below)

Consider WHY you woudln't want to keep that primary key.


Assuming MSSQL or compatible:

ALTER TABLE Employee ADD EmployeeID int identity(1,1) PRIMARY KEY;

WHILE EXISTS (SELECT COUNT(*) FROM Employee GROUP BY EmpID, EmpSSN HAVING COUNT(*) > 1)
BEGIN
    DELETE FROM Employee WHERE EmployeeID IN 
    (
        SELECT MIN(EmployeeID) as [DeleteID]
        FROM Employee
        GROUP BY EmpID, EmpSSN
        HAVING COUNT(*) > 1
    )
END
ck
+1: to quote some SQL god: "if it doesn't have a primary key, it's not a table"
marc_s
+1 A primary key identifies a row. No PK = no sense. @marc_s: a clustered index differentiates a table from a heap. No PK simply means no data integrity
gbn
@gbn: even a heap is considered a table :-) This quote was more along the lines: unless you specify a primary key, a table really doesn't have much usefulness (except in edge cases like bulk import / temporary tables etc.)
marc_s
even in those edge cases I almost always add a primary key, just so I can delete dupped recrds if need be.
HLGEM
Looks like the duplicate removal is being done so the EmpID can be the primary key. The other data seems dependant on it.
foriamstu
+5  A: 

Use the row number to differentiate between duplicate records. Keep the first row number for an EmpID/EmpSSN and delete the rest:

    DELETE FROM Employee a
     WHERE ROW_NUMBER() <> ( SELECT MIN( ROW_NUMBER() )
                               FROM Employee b
                              WHERE a.EmpID  = b.EmpID
                                AND a.EmpSSN = b.EmpSSN )
Paul Morgan
+1 A good solution to avoid having to make structural changes
foriamstu
A: 

to delete duplicate rows in sql server with primary key

A: 

select t1.* from employee t1, employee t2 where t1.empid=t2.empid and t1.empname = t2.empname and t1.salary = t2.salary group by t1.empid, t1.empname,t1.salary having count(*) > 1

tejas rupani
A: 

If you don't want to create a new primary key you can use the TOP command in SQL Server:

declare @ID int
while EXISTS(select count(*) from Employee group by EmpId having count(*)> 1)
begin
    select top 1 @ID = EmpId
    from Employee 
    group by EmpId
    having count(*) > 1

    DELETE TOP(1) FROM Employee WHERE EmpId = @ID
end
Joe
A: 

create unique clustered index Employee_idx on Employee ( EmpId,EmpSSN ) with ignore_dup_key

You can drop the index if you don't need it.

Sudhir