views:

41

answers:

3

I want to remove duplicate records from a table without using temp table how is it possible?

+2  A: 
  1. Make a query A that gets you the rows that qualify as duplicates
  2. Make a query B that gets you the rows that you want to keep
  3. Make a query that deletes all rows that are in A but not in B.

Example.

Assuming a table named table, with an autoincrement ID column named id and a column named name of which you want to remove the doubles. Of each name, you want to keep the oldest record (the one with the lowest ID).

Query A then looks something like:

SELECT * FROM table 
WHERE name IN (SELECT name FROM table GROUP BY name HAVING COUNT(*) > 1)

Query B would be:

SELECT * FROM table 
WHERE id IN (SELECT min(id) FROM table GROUP BY name)

Now combine these to form the delete query:

DELETE FROM table 
WHERE name IN (SELECT name FROM table GROUP BY name HAVING COUNT(*) > 1)
AND NOT id IN (SELECT min(id) FROM table GROUP BY name)

In the example at hand, you could leave out the first query, but when things get more complicated, it's a nice extra safeguard.

tdammers
A: 

You can't. Fully duplicate records (ones which have every field equal) can't be deleted by one, because you can't differentiate them in a Where clause in a delete query.
The only way would be to do a select distinct query to select all rows without duplicates, then insert them in an empty table.

If you don't have fully duplicate records, then the question is formulated incorrectly, and you don't want to delete duplicate records, because there aren't any. Rows that have an incomplete set of equal fields aren't duplicates. In this case you would want to delete rows that have some fields equal, in which case you would have to specify which ones to leave if a certain field or field set isn't equal.

Alexander
I don't think that this is true. Suppose that you select all from a table, plus a ROW_NUMBER() column. The rows would be returned in clustered index order, and you would get a different row number value for each one. You could then use that to differentiate duplicates in a WHERE clause.
Yellowfog
row_number is a function that works with an order by statement, which is only allowed in a select query. That's just one of the reasons why row_number can't participate in a delete statement directly. One of the solutions I found on the internet, was about creating a separate table, introducing row_number as a new field, then perform the delete statement. This is not what the author wants.
Alexander
A: 

You could do this without a problem by using a common table expression (CTE), you don't need to use any temporary tables at all. Just be careful if the delete is going against a high traffic table. Deleting large amounts of data can cause locking and blocking, also the tran log will be hit.

Note: This code was written without any testing, but should work (SQL 2005 and above).

/* Create test data with duplicates */
declare @TestTable Table (Col1 int)

insert into @TestTable

select 1 union all
select 1 union all
select 2 union all
select 3 union all
select 3 union all
select 4
;
/* Create CTE to number all duplicates (gives a running number to all identical values in Col1) */
with FindDupes as
(
Select Col1,ROW_NUMBER() over (partition by Col1 order by Col1) RN

from @TestTable
)
/* Delete the duplicates (anything that has a higher rownumber than one) */
Delete from FindDupes where RN>1
;

/* Select the remaining data from the table */
Select * from @TestTable
WilliamD