ansaurus

Question

sql delete rows with 1 column duplicated

Answer 1

A:

Try this.

DELETE FROM <TABLE_NAME_HERE> WHERE <SECOND_COLUMN_NAME_HERE> IN ("bbb","abc","def");

Jass 2009-10-30 06:43:14

hum... I hope muhan was after a more generic solution ;-)

mjv 2009-10-30 06:44:44

He dint specify if he wanted to delete randomly or lexicographically.so.. um second, third and fifth deleted *grin*

Jass 2009-10-30 06:50:15

Yes that example table was just an example. My table is much larger and many more duplicates.

muhan 2009-10-30 06:50:39

mhan - what is the rule that is used to decide which row to keep?

Mark 2009-10-30 12:26:44

Answer 2

+2 A:

Let's call these the id and the Col1 columns.

DELETE myTable T1
WHERE EXISTS
  (SELECT * FROM myTable T2
   WHERE T2.id = T1.id AND T2.Col1 > T1.Col1)

Edit: As pointed out by Andomar, the above doesn't get rid of exact duplicate cases, where both id and Col1 are the same in different rows. These can be handled as follow:

(note: whereby the above query is generic SQL, the following applies to MSSQL 2005 and above)
It uses the Common Table Expression (CTE) feature, along with ROW_NUMBER() function to produce a distinctive row value. It is essentially the same construct as the above except that it now works with a "table" (CTEs are mostly like a table) which has a truly distinct identifier key.
Note that by removing "AND T2.Col1 = T1.Col1", we produce a query which can handle both types of duplicates (id-only duplicates and both Id and Col1 duplicates) in a single query, i.e. in a similar fashion that Hamadri's solution (the PARTITION in his/her CTE serves the same purpose as the subquery in this solution, essentially the same amount of work is done). Depending on the situation, it may be preferable, performance-wise or other, to handle the situation in two steps.

WITH T AS
  (SELECT ROW_NUMBER() OVER (ORDER BY id, Col1) AS rn, id, Col1 FROM MyTable)
DELETE T AS T1
WHERE EXISTS
   (SELECT * 
    FROM T AS T2
    WHERE T2.id = T1.id AND T2.Col1 = T1.Col1
      AND T2.rn > T1.rn
   )

mjv 2009-10-30 06:43:46

This assumes (col2,col1) is unique, which seems a stretch

Andomar 2009-10-30 07:18:38

@Andomar. right you are (though you mean Id and Col1). See edit with a CTE construct which handles this case).

mjv 2009-10-30 12:58:01

Answer 3

+1 A:

DELETE tableName as ta
WHERE col2 NOT IN (SELECT MIN(col2) FROM tableName AS t2 GROUP BY col1)

Make sure the sub select returns the rows you want to keep.

idstam 2009-10-30 06:44:16

This assumes (col2,col1) is unique, which seems a stretch

Andomar 2009-10-30 07:18:02

Answer 4

+2 A:

Try the following query in sql server 2005

WITH T AS (SELECT ROW_NUMBER()OVER(PARTITION BY id ORDER BY id) AS rnum,* FROM dbo.Table_1)
DELETE FROM T WHERE rnum>1

Himadri 2009-10-30 07:00:21

+1 This should work even if (col2,col1) is not unique

Andomar 2009-10-30 07:19:11

Answer 5

A:

SQL server is not my native SQL database, but maybe something like this? The idea is to get the duplicates and delete the ones with the larger ROW_NUMBER. This should leave only the first one. I dont know if this is what you want or if it will work, but the logic seems sound

DELETE T1
FROM T1 T2
WHERE T1.Col1 = T2.col1
AND T1.ROW_NUMBER() > T2.ROW_NUMBER()

Please feel free to correct me if SQL server cant handle that kind of treatment :)

TerrorAustralis 2009-10-30 13:16:10

ansaurus

tags:

views:

answers:

sql delete rows with 1 column duplicated

related questions