views:

83

answers:

3

I've got a table in a testing DB that someone apparently got a little too trigger-happy on when running INSERT scripts to set it up. The schema looks like this:

ID UNIQUEIDENTIFIER
TYPE_INT SMALLINT
SYSTEM_VALUE SMALLINT
NAME VARCHAR
MAPPED_VALUE VARCHAR

It's supposed to have a few dozen rows. It has about 200,000, most of which are duplicates in which TYPE_INT, SYSTEM_VALUE, NAME and MAPPED_VALUE are all identical and ID is not.

Now, I could probably make a script to clean this up that creates a temporary table in memory, uses INSERT .. SELECT DISTINCT to grab all the unique values, TRUNCATE the original table and then copy everything back. But is there a simpler way to do it, like a DELETE query with something special in the WHERE clause?

+4  A: 

You don't give your table name but I think something like this should work. Just leaving the record which happens to have the lowest ID. You might want to test with the ROLLBACK in first!

BEGIN TRAN
DELETE <table_name>
FROM <table_name> T1
WHERE EXISTS(
SELECT * FROM <table_name> T2 
WHERE     
T1.TYPE_INT = T2.TYPE_INT  AND
T1.SYSTEM_VALUE = T2.SYSTEM_VALUE  AND
T1.NAME = T2.NAME  AND
T1.MAPPED_VALUE = T2.MAPPED_VALUE  AND
T2.ID > T1.ID
)

SELECT * FROM <table_name>

ROLLBACK
Martin Smith
+2  A: 
WITH Duplicates(ID , TYPE_INT, SYSTEM_VALUE, NAME, MAPPED_VALUE )
AS
(
SELECT  Min(Id)  ID  TYPE_INT, SYSTEM_VALUE, NAME, MAPPED_VALUE 
FROM T1
GROUP BY TYPE_INT, SYSTEM_VALUE, NAME, MAPPED_VALUE
HAVING Count(Id) > 1
)
DELETE FROM T1
WHERE ID IN (
SELECT T1.Id
FROM T1
INNER JOIN Duplicates
ON T1.TYPE_INT = Duplicates.TYPE_INT
AND T1.SYSTEM_VALUE = Duplicates.SYSTEM_VALUE
AND T1.NAME = Duplicates.NAME
AND T1.MAPPED_VALUE = Duplicates.MAPPED_VALUE
AND T1.Id <> Duplicates.ID
) 
Glennular
+2  A: 

here is a great article on that: Deleting duplicates, which basically uses this pattern:

WITH    q AS
        (
        SELECT  d.*,
                ROW_NUMBER() OVER (PARTITION BY id ORDER BY value) AS rn
        FROM    t_duplicate d
        )
DELETE
FROM    q
WHERE   rn > 1

SELECT  *
FROM    t_duplicate
KM