ansaurus

Question

Deleting duplicate record from table - SQL query

Answer 1

+1 A:

If you have the id's of the rows you want to delete then...

DELETE FROM table WHERE id IN (1, 4, 7, [id numbers to delete...])

Chalkey 2009-11-17 12:20:32

If your going to mark down at least give a reason.

Chalkey 2009-11-17 12:31:08

This seems like the long way of doing it... very manual...

Zoidberg 2009-11-17 13:26:21

@Zoidberg his question says he has 3 duplicates and needs to delete two. If he knows the ID's then this is sufficient.

Chalkey 2009-11-17 13:37:03

Again, question details are sketchy... so yeah, I can understand the confusion.

Zoidberg 2009-11-17 13:40:34

Answer 2

+2 A:

DELETE FROM Table t1, Table t2 WHERE t1.colDup = t2.colDup AND t1.date < t2.date

Will delete every duplicate row from Table (on column colDup) except the oldest (i.e. lowset date).

jensgram 2009-11-17 12:21:16

The thing I don't like about this, is that it assumes there is one column that differentiates these duplicate rows, but given the amount of detail in this question...

Zoidberg 2009-11-17 13:25:43

You're quite right, but then again; "the amount of detail...".

jensgram 2009-11-17 13:35:49

Tis true, I think Mark is correct about all this!

Zoidberg 2009-11-17 13:37:24

Answer 3

+2 A:

DELETE FROM `mytbl`
    INNER JOIN (
     SELECT 1 FROM `mytbl`
     GROUP BY `duplicated_column` HAVING COUNT(*)=2
    ) USING(`id`)

Edit:

My bad, the above query won't work.

Assuming table structure:

id int auto_increment

num int # <-- this is the column with duplicated values

The following query would work in MySQL (i checked):

DELETE `mytbl` FROM `mytbl` 
    INNER JOIN 
    (
     SELECT `num` FROM `mytbl`
     GROUP BY `num` HAVING COUNT(*)=2
    ) AS `tmp` USING (`num`)

The query would delete the rows that have 2 (not more or else) duplicated values in the num column.

Edit (again):

I suggest to add a key on the num column.

Edit(#3):

In case that the author wanted to delete the duplicated rows, the following should work for MySQL (it worked for me):

DELETE `delete_duplicated_rows` FROM `delete_duplicated_rows`
    NATURAL JOIN (
     SELECT *
     FROM `delete_duplicated_rows`
     GROUP BY `num1` HAVING COUNT(*)=2
    ) AS `der`

While assuming table structure is:

CREATE TABLE `delete_duplicated_rows` (
  `num1` tinyint(4) DEFAULT NOT NULL,
  `num2` tinyint(4) DEFAULT NOT NULL
) ENGINE=MyISAM;

Dor 2009-11-17 12:21:44

I don't think that this solution actually follows the restrictions laid out in the problem, does it? Santanu indicated that all *rows* were duplicates. Having an IDENTITY column means that the rows aren't duplicates - and makes this a fairly easy problem. Muhammad's answer is better.

Mark Brittingham 2009-11-17 13:31:16

@Mark Brittingham: You're correct! I've edit my answer and I've add another solution.

Dor 2009-11-17 18:32:16

Answer 4

+1 A:

I think each table has unique identifier. So if it exists then you can write following query: Delete Table1 from Table1 t1 where 2 >= (select count(id) from Table1 where dupColumn = t1.dupColumn) and t1.id not in (select max (id) from Table1 where dupColumn = t1.dupColumn)

OOps. It seems it is possible to use second filter only Delete Table1 from Table1 t1 where t1.id not in (select max (id) from Table1 where dupColumn = t1.dupColumn)

Danil 2009-11-17 13:00:45

Answer 5

+8 A:

Please try this query, this will definitly work, as i have tested..

SET ROWCOUNT 1
DELETE test
FROM test a
WHERE (SELECT COUNT(*) FROM test b WHERE b.name = a.name) > 1
WHILE @@rowcount > 0
  DELETE test
  FROM test a
  WHERE (SELECT COUNT(*) FROM test b WHERE b.name = a.name) > 1
SET ROWCOUNT 0

where test is your table name

Muhammad Akhtar 2009-11-17 13:11:08

@Muhammad - you're right. I left a comment in Dor's answer to that effect as well. This is the right answer as it doesn't assume anything "extra" in Santanu's question. I am playing with this answer and have to admit that I have only a glimmer of why it works. It uses ROWCOUNT and WHILE in ways that I just don't quite understand yet. Always good to learn something new! ...And I'm usually the one who gets the impossible answer on SQL Questions :-(

Mark Brittingham 2009-11-17 13:38:35

Mark, I think you should post this as an answer so I can upvote it!

Zoidberg 2009-11-17 13:38:36

Ok - I get the WHILE now - it isn't linked (magically) to the Delete. The SET ROWCOUNT 1 ensures that the following operations only apply to 1 ROW at a time rather than being fully set oriented. The Delete then deletes ONE row where there is a duplicate on the a.name. The WHILE makes use of the resulting @@rowcount and starts a loop. The loop dies when there are no more duplicates. All cool. *However* - I still am working on the Delete TblName From TblName Where X construct... It *is* required to make this work - just have to figure out why.

Mark Brittingham 2009-11-17 13:47:14

@Zoidberg - I will, along with an easier way to delete than found in Dor's answer. I want to see Muhammad's anser get a few more upvotes though. It is a *great* answer because it is forcing me to learn something new.

Mark Brittingham 2009-11-17 13:48:47

I thought @@rowcount was MSSQL, not across all dbs? MySQL has ROW_COUNT() instead, right? It's a quite nice bit of code, though

CodeByMoonlight 2009-11-17 14:03:13

Now that Muhammad's answer is "accepted" - I provide my own, which does not require a loop. In the end, I'd use mine (for reasons I indicate below) but *THIS* answer is why I surf StackOverflow: there are just so many really smart and experienced people here.

Mark Brittingham 2009-11-17 14:04:17

Muhammad - thanks again. What I didn't get about the "Delete TblName From TblName A..." construct is that you have to identify the table from which you'd be deleting rows because your FROM clause renames the table ("A"). Also, your WHERE is, technically, operating on another table: the "b" version of the test table.So, the delete could also be: DELETE FROM DupTable WHERE (SELECT COUNT(*) FROM DupTable b WHERE b.Col1 = DupTable.Col1) > 1

Mark Brittingham 2009-11-17 14:47:59

Answer 6

+3 A:

This works in SQL Server although it isn't a single statement:

Declare @cnt int; 
Select @cnt=COUNT(*) From DupTable Where (Col1=1);  -- Assumes you are trying to delete the duplicates where some condition (e.g. Col1=1) is true.
Delete Top (@cnt-1) From DupTable

It also doesn't require any extra assumptions (like the existance of another column that makes each row unique). After all, Santanu did say that the rows were duplicates and not just the one column.

However, the right answer, in my view, is to get a real table structure. That is, add an IDENTITY column to this table so that you can use a single SQL command to do your work. Like this:

ALTER TABLE dbo.DupTable ADD
    IDCol int NOT NULL IDENTITY (1, 1)
GO

Then the delete is trivial:

DELETE FROM DupTable WHERE IDCol NOT IN 
   (SELECT MAX(IDCol) FROM DupTable GROUP BY Col1, Col2, Col3)

Mark Brittingham 2009-11-17 13:55:04

Answer 7

A:

  -- Just to demonstrates Marks example          
    . 
        -- START === 1.0.dbo..DuplicatesTable.TableCreate.sql
    /****** Object:  Table [dbo].[DuplicatesTable] 
        Script Date: 03/29/2010 21:24:02 ******/
      IF EXISTS (SELECT * FROM sys.objects 
     WHERE 
object_id = OBJECT_ID(N'[dbo].[DuplicatesTable]') 
AND type in (N'U'))
        DROP TABLE [dbo].[DuplicatesTable]
    GO

    /****** Object:  Table [dbo].[DuplicatesTable]    
Script Date: 03/29/2010 21:24:02 ******/
    SET ANSI_NULLS ON
    GO

    SET QUOTED_IDENTIFIER ON
    GO

    CREATE TABLE [dbo].[DuplicatesTable](
        [ColA] [varchar](10) NOT NULL, -- the name of the DuplicatesTable
        [ColB] [varchar](10) NULL,  -- the description of the e DuplicatesTable 
     ) 


    /* 
    <doc> 
    Models a DuplicatesTable for 
    </doc>

    */


    GO


    --============================================================ DuplicatesTable START
    declare @ScriptFileName varchar(2000)
    SELECT @ScriptFileName = '$(ScriptFileName)'
    SELECT @ScriptFileName + ' --- DuplicatesTable START =========================================' 
    declare @TableName varchar(200)
    select @TableName = 'DuplicatesTable'

    SELECT 'SELECT name from sys.tables where name =''' + @TableName + ''''
    SELECT name from sys.tables 
    where name = @TableName

    DECLARE @TableCount INT 
    SELECT @TableCount  = COUNT(name ) from sys.tables 
        where name =@TableName

    if @TableCount=1
    SELECT ' DuplicatesTable PASSED. The Table ' + @TableName + ' EXISTS ' 
    ELSE 
    SELECT ' DuplicatesTable FAILED. The Table ' + @TableName + ' DOES NOT EXIST ' 
    SELECT @ScriptFileName + ' --- DuplicatesTable END =========================================' 
    --============================================================ DuplicatesTable END

    GO


    -- END ===  1.0.dbo..DuplicatesTable.TableCreate.sql

    . 
    -- START === 1.1..dbo..DuplicatesTable.TableInsert.sql

    BEGIN TRANSACTION;
    INSERT INTO [dbo].[DuplicatesTable]([ColA], [ColB])
    SELECT   N'ColA', N'ColB' UNION ALL
    SELECT N'ColA', N'ColB' UNION ALL
    SELECT  N'ColA', N'ColB' UNION ALL
    SELECT  N'ColA', N'ColB' UNION ALL
    SELECT  N'ColA', N'ColB' UNION ALL
    SELECT  N'ColA', N'ColB' UNION ALL
    SELECT  N'ColA', N'ColB' UNION ALL
    SELECT  N'ColA1', N'ColB1' UNION ALL
    SELECT  N'ColA1', N'ColB1' UNION ALL
    SELECT  N'ColA1', N'ColB1' UNION ALL
    SELECT  N'ColA1', N'ColB1' UNION ALL
    SELECT  N'ColA1', N'ColB1' UNION ALL
    SELECT  N'ColA1', N'ColB1' UNION ALL
    SELECT  N'ColA1', N'ColB1'
    COMMIT;
    RAISERROR (N'[dbo].[DuplicatesTable]: Insert Batch: 1.....Done!', 10, 1) WITH NOWAIT;
    GO


    -- END ===  1.1..dbo..DuplicatesTable.TableInsert.sql

    . 
    -- START === 2.0.RemoveDuplicates.Script.sql
    ALTER TABLE dbo.DuplicatesTable ADD
            DuplicatesTableId int NOT NULL IDENTITY (1, 1)
    GO

    -- Then the delete is trivial:
    DELETE FROM dbo.DuplicatesTable WHERE DuplicatesTableId NOT IN 
         (SELECT MAX(DuplicatesTableId) FROM dbo.DuplicatesTable GROUP BY ColA , ColB)

         Select * from DuplicatesTable ;  
    -- END ===  2.0.RemoveDuplicates.Script.sql

YordanGeorgiev 2010-09-29 14:48:40

ansaurus

tags:

views:

answers:

Deleting duplicate record from table - SQL query

related questions