I want to remove duplicate records from a table without using temp table how is it possible?
- Make a query A that gets you the rows that qualify as duplicates
- Make a query B that gets you the rows that you want to keep
- Make a query that deletes all rows that are in A but not in B.
Example.
Assuming a table named table
, with an autoincrement ID column named id
and a column named name
of which you want to remove the doubles. Of each name, you want to keep the oldest record (the one with the lowest ID).
Query A then looks something like:
SELECT * FROM table
WHERE name IN (SELECT name FROM table GROUP BY name HAVING COUNT(*) > 1)
Query B would be:
SELECT * FROM table
WHERE id IN (SELECT min(id) FROM table GROUP BY name)
Now combine these to form the delete query:
DELETE FROM table
WHERE name IN (SELECT name FROM table GROUP BY name HAVING COUNT(*) > 1)
AND NOT id IN (SELECT min(id) FROM table GROUP BY name)
In the example at hand, you could leave out the first query, but when things get more complicated, it's a nice extra safeguard.
You can't. Fully duplicate records (ones which have every field equal) can't be deleted by one, because you can't differentiate them in a Where
clause in a delete query.
The only way would be to do a select distinct
query to select all rows without duplicates, then insert them in an empty table.
If you don't have fully duplicate records, then the question is formulated incorrectly, and you don't want to delete duplicate records, because there aren't any. Rows that have an incomplete set of equal fields aren't duplicates. In this case you would want to delete rows that have some fields equal, in which case you would have to specify which ones to leave if a certain field or field set isn't equal.
You could do this without a problem by using a common table expression (CTE), you don't need to use any temporary tables at all. Just be careful if the delete is going against a high traffic table. Deleting large amounts of data can cause locking and blocking, also the tran log will be hit.
Note: This code was written without any testing, but should work (SQL 2005 and above).
/* Create test data with duplicates */
declare @TestTable Table (Col1 int)
insert into @TestTable
select 1 union all
select 1 union all
select 2 union all
select 3 union all
select 3 union all
select 4
;
/* Create CTE to number all duplicates (gives a running number to all identical values in Col1) */
with FindDupes as
(
Select Col1,ROW_NUMBER() over (partition by Col1 order by Col1) RN
from @TestTable
)
/* Delete the duplicates (anything that has a higher rownumber than one) */
Delete from FindDupes where RN>1
;
/* Select the remaining data from the table */
Select * from @TestTable