tags:

views:

233

answers:

5

I have a microsoft sql 2005 db table where the entire row is not duplicate, but a column is duplicated.

1 aaa
1 bbb
1 ccc
2 abc
2 def

How can i delete all the rows but 1 that have the first column duplicated?

For clarification I need to get rid of the second, third and fifth rows.

A: 

Try this.

DELETE FROM <TABLE_NAME_HERE> WHERE <SECOND_COLUMN_NAME_HERE> IN ("bbb","abc","def");
Jass
hum... I hope muhan was after a more generic solution ;-)
mjv
He dint specify if he wanted to delete randomly or lexicographically.so.. um second, third and fifth deleted *grin*
Jass
Yes that example table was just an example. My table is much larger and many more duplicates.
muhan
mhan - what is the rule that is used to decide which row to keep?
Mark
+2  A: 

Let's call these the id and the Col1 columns.

DELETE myTable T1
WHERE EXISTS
  (SELECT * FROM myTable T2
   WHERE T2.id = T1.id AND T2.Col1 > T1.Col1)

Edit: As pointed out by Andomar, the above doesn't get rid of exact duplicate cases, where both id and Col1 are the same in different rows. These can be handled as follow:

(note: whereby the above query is generic SQL, the following applies to MSSQL 2005 and above)
It uses the Common Table Expression (CTE) feature, along with ROW_NUMBER() function to produce a distinctive row value. It is essentially the same construct as the above except that it now works with a "table" (CTEs are mostly like a table) which has a truly distinct identifier key.
Note that by removing "AND T2.Col1 = T1.Col1", we produce a query which can handle both types of duplicates (id-only duplicates and both Id and Col1 duplicates) in a single query, i.e. in a similar fashion that Hamadri's solution (the PARTITION in his/her CTE serves the same purpose as the subquery in this solution, essentially the same amount of work is done). Depending on the situation, it may be preferable, performance-wise or other, to handle the situation in two steps.

WITH T AS
  (SELECT ROW_NUMBER() OVER (ORDER BY id, Col1) AS rn, id, Col1 FROM MyTable)
DELETE T AS T1
WHERE EXISTS
   (SELECT * 
    FROM T AS T2
    WHERE T2.id = T1.id AND T2.Col1 = T1.Col1
      AND T2.rn > T1.rn
   )
mjv
This assumes (col2,col1) is unique, which seems a stretch
Andomar
@Andomar. right you are (though you mean Id and Col1). See edit with a CTE construct which handles this case).
mjv
+1  A: 
DELETE tableName as ta
WHERE col2 NOT IN (SELECT MIN(col2) FROM tableName AS t2 GROUP BY col1)

Make sure the sub select returns the rows you want to keep.

idstam
This assumes (col2,col1) is unique, which seems a stretch
Andomar
+2  A: 

Try the following query in sql server 2005

WITH T AS (SELECT ROW_NUMBER()OVER(PARTITION BY id ORDER BY id) AS rnum,* FROM dbo.Table_1)
DELETE FROM T WHERE rnum>1
Himadri
+1 This should work even if (col2,col1) is not unique
Andomar
A: 

SQL server is not my native SQL database, but maybe something like this? The idea is to get the duplicates and delete the ones with the larger ROW_NUMBER. This should leave only the first one. I dont know if this is what you want or if it will work, but the logic seems sound

DELETE T1
FROM T1 T2
WHERE T1.Col1 = T2.col1
AND T1.ROW_NUMBER() > T2.ROW_NUMBER()

Please feel free to correct me if SQL server cant handle that kind of treatment :)

TerrorAustralis