tags:

views:

9488

answers:

15

What is the best way to remove duplicate rows from a fairly large table (i.e. 300,000+ rows)?

The rows of course will not be perfect duplicates because of the existence of the RowID identity field.

MyTable
-----------
RowID int not null identity(1,1) primary key,
Col1 varchar(20) not null,
Col2 varchar(2048) not null,
Col3 tinyint not null
+10  A: 

There's a good article on removing duplicates on the Microsoft Support site. It's pretty conservative - they have you do everything in separate steps - but it should work well against large tables.

I've used self-joins to do this in the past, although it could probably be prettied up with a HAVING clause:

delete from dupes
from MyTable dupes, MyTable fullTable
where dupes.dupField = fullTable.dupField 
  and dupes.secondDupField  = fullTable.secondDupField 
  and dupes.uniqueField > fullTable.uniqueField
Jon Galloway
A: 

Create a new temporary table. For each row in the old table, check if it exists in the temp table, if it doesn't insert it, if it exists, then move to the next row.

Once done, drop the original, and rename the temp.

Vaibhav
A: 

select distinct col1, col2, col3 from mytable.

Do not include the RowId column in your select

nmiranda
+1  A: 

Here is another good article on removing duplicates.

It discusses why its hard: "SQL is based on relational algebra, and duplicates cannot occur in relational algebra, because duplicates are not allowed in a set."

The temp table solution, and two mysql examples.

In the future are you going to prevent it at a database level, or from an application perspective. I would suggest the database level because your database should be responsible for maintaining referential integrity, developers just will cause problems ;)

Craig
+73  A: 

Assuming no nulls, you GROUP BY the unique columns, and SELECT the MIN (or MAX) RowId as the row to keep. Then, just delete everything that didn't have a row id:

DELETE MyTable 
FROM MyTable
LEFT OUTER JOIN (
   SELECT MIN(RowId) as RowId, Col1, Col2, Col3 
   FROM MyTable 
   GROUP BY Col1, Col2, Col3
) as KeepRows ON
   MyTable.RowId = KeepRows.RowId
WHERE
   KeepRows.RowId IS NULL
Mark Brackett
A fantastically clear solution.
Chris
Mark you saved my time, very good solution.
Sharique
+1 An answer that keeps giving.
amelvin
Awesome solution, need to add this one to my book of tricks...
Abe Miessler
Would this work as well? `DELETE FROM MyTable WHERE RowId NOT IN (SELECT MIN(RowId) FROM MyTable GROUP BY Col1, Col2, Col3);`
Georg
CTE's can be used to do this more elegantly and possibly more efficiently in SQL Server 2005+ - As in [my answer](http://stackoverflow.com/questions/18932/sql-how-can-i-remove-duplicate-rows/3822833#3822833)!
Martin Smith
A: 

@Craig

In the future are you going to prevent it at a database level, or from an application perspective

From the application level (unfortunately). I agree that the proper way to prevent duplication is at the database level through the use of a unique index, but in SQL Server 2005, an index is allowed to be only 900 bytes, and my varchar(2048) field blows that away.

Terrapin
What about unique constraint? Is it crippled in the same way?
Constantin
A: 

@me.yahoo.com/brackett

Thanks for the solution - I implemented it and it works great. Deleted 294,378 duplicate rows in 6 seconds.

Terrapin
I suggest that you post this kind of answers as comments. Otherwise, the posts gets separated.
Antoine Aubry
A: 

Oh sure. Use a temp table. If you want a single, not-very-performant statement that "works" you can go with:

DELETE FROM MyTable WHERE NOT RowID IN
    (SELECT 
        (SELECT TOP 1 RowID FROM MyTable mt2 WHERE mt2.Col1 = mt.Col1 AND mt2.Col2 = mt.Col2 AND mt2.Col3 = mt.Col3) 
    FROM MyTable mt)

Basically, for each row in the table, the sub-select finds the top RowID of all rows that are exactly like the row under consideration. So you end up with a list of RowIDs that represent the "original" non-duplicated rows.

Jacob Proffitt
A: 

From the application level (unfortunately). I agree that the proper way to prevent duplication is at the database level through the use of a unique index, but in SQL Server 2005, an index is allowed to be only 900 bytes, and my varchar(2048) field blows that away.

I dunno how well it would perform, but I think you could write a trigger to enforce this, even if you couldn't do it directly with an index. Something like:

-- given a table stories(story_id int not null primary key, story varchar(max) not null)
create trigger prevent_plagiarism on stories
after insert, update
as
declare @cnt as int
select @cnt = count(*) from stories inner join inserted on (stories.story = inserted.story and stories.story_id != inserted.story_id)
if @cnt > 0
begin
    raiserror('plagiarism detected', 16, 1)
rollback transaction
end

Also, varchar(2048) sounds fishy to me (some things in life are 2048 bytes, but it's pretty uncommon); should it really not be varchar(max)?

DrPizza
A: 
delete from my_dummy_table
where (pk_field1,pk_field2) in(
select 
  t.pk_field1
 ,t.pk_field2
from
  my_dummy_table t
 ,(
  select
    t1.pk_field1
   ,t1.pk_field2
  from
    my_dummy_table t1
   ,my_dummy_table t2
  where 1=1
    and (
         t1.pk_field1 <> t2.pk_field1
         or
         t1.pk_field2 <> t2.pk_field2
         ) 
    and t1.dup_field1 = t2.dup_field1
    and t1.dup_field2 = t2.dup_field2
  ) duplicates
where t.pk_field1 = duplicates.pk_field1
  and t.pk_field2 = duplicates.pk_field2
  and t.pk_field1||'-'||t.pk_field2 not in(
                                          select
                                            min(t1.pk_field1||'-'||t1.pk_field2) as keep_this_one
                                          from
                                            my_dummy_table t1
                                           ,my_dummy_table t2
                                          where 1=1
                                            and (
                                                 t1.pk_field1 <> t2.pk_field1
                                                 or
                                                 t1.pk_field2 <> t2.pk_field2
                                                 ) 
                                            and t1.dup_field1 = t2.dup_field1
                                            and t1.dup_field2 = t2.dup_field2
                                          group by
                                            t1.dup_field1
                                           ,t1.dup_field2
                                          )
)
JosephStyons
This looks far more complex than required! SQL Server doesn't have any such syntax as `||` either.
Martin Smith
A: 
  1. Create new blank table with the same structure

  2. Execute query like this

    INSERT INTO tc_category1 SELECT * FROM tc_category GROUP BY category_id, application_id HAVING count(*) > 1

  3. Then execute this query

    INSERT INTO tc_category1 SELECT * FROM tc_category GROUP BY category_id, application_id HAVING count(*) = 1

A: 

I had a table where I needed to preserve non-duplicate rows. I'm not sure on the speed or efficiency.

DELETE FROM myTable WHERE RowID IN (
  SELECT MIN(RowID) AS IDNo FROM myTable
  GROUP BY Col1, Col2, Col3
  HAVING COUNT(*) = 2 )
chrismar035
This assumes that there is at most 1 duplicate.
Martin Smith
A: 
    . 
    -- START === D:\cas\sfw\sql\sql_dev\Install\DbName\6.Testing\4.DuplicatesRemoval\1.0..dbo..DuplicatesTable.TableCreate.sql
    /****** Object:  Table [dbo].[DuplicatesTable]    Script Date: 03/29/2010 21:24:02 ******/
    IF  EXISTS (SELECT * FROM sys.objects 
    WHERE object_id = OBJECT_ID(N'[dbo].[DuplicatesTable]') AND type in (N'U'))
    DROP TABLE [dbo].[DuplicatesTable]
    GO

    /****** Object:  Table [dbo].[DuplicatesTable]    Script Date: 03/29/2010 21:24:02 ******/
    SET ANSI_NULLS ON
    GO

    SET QUOTED_IDENTIFIER ON
    GO

    CREATE TABLE [dbo].[DuplicatesTable](
        [DuplicatesTableId] [bigint] IDENTITY(1,1) NOT NULL, --the PK  for DuplicatesTable
        [ColA] [varchar](10) NOT NULL, -- the name of the DuplicatesTable
        [ColB] [varchar](10) NULL,  -- the description of the e DuplicatesTable 
     CONSTRAINT [PK_DuplicatesTable] PRIMARY KEY CLUSTERED 
    (
        [DuplicatesTableId] ASC
    )WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, 
    ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
    ) ON [PRIMARY]



    /* 
    <doc> 
    Models a DuplicatesTable for 
    </doc>

    */


    GO


    --============================================================ DuplicatesTable START
    declare @ScriptFileName varchar(2000)
    SELECT @ScriptFileName = '$(ScriptFileName)'
    SELECT @ScriptFileName + ' --- DuplicatesTable START =========================================' 
    declare @TableName varchar(200)
    select @TableName = 'DuplicatesTable'

    SELECT 'SELECT name from sys.tables where name =''' + @TableName + ''''
    SELECT name from sys.tables 
    where name = @TableName

    DECLARE @TableCount INT 
    SELECT @TableCount  = COUNT(name ) from sys.tables 
        where name =@TableName

    if @TableCount=1
    SELECT ' DuplicatesTable PASSED. The Table ' + @TableName + ' EXISTS ' 
    ELSE 
    SELECT ' DuplicatesTable FAILED. The Table ' + @TableName + ' DOES NOT EXIST ' 
    SELECT @ScriptFileName + ' --- DuplicatesTable END =========================================' 
    --============================================================ DuplicatesTable END

    GO


    -- END ===  D:\cas\sfw\sql\sql_dev\Install\DbName\6.Testing\4.DuplicatesRemoval\1.0..dbo..DuplicatesTable.TableCreate.sql

    . 
    -- START === D:\cas\sfw\sql\sql_dev\Install\DbName\6.Testing\4.DuplicatesRemoval\1.1..dbo..DuplicatesTable.TableInsert.sql
    SET NOCOUNT ON;
    SET XACT_ABORT ON;
    GO

    SET IDENTITY_INSERT [dbo].[DuplicatesTable] ON;
    BEGIN TRANSACTION;
    INSERT INTO [dbo].[DuplicatesTable]([DuplicatesTableId], [ColA], [ColB])
    SELECT 1, N'ColA', N'ColB' UNION ALL
    SELECT 2, N'ColA', N'ColB' UNION ALL
    SELECT 3, N'ColA', N'ColB' UNION ALL
    SELECT 4, N'ColA', N'ColB' UNION ALL
    SELECT 5, N'ColA', N'ColB' UNION ALL
    SELECT 6, N'ColA', N'ColB' UNION ALL
    SELECT 7, N'ColA', N'ColB' UNION ALL
    SELECT 8, N'ColA1', N'ColB1' UNION ALL
    SELECT 9, N'ColA1', N'ColB1' UNION ALL
    SELECT 10, N'ColA1', N'ColB1' UNION ALL
    SELECT 11, N'ColA1', N'ColB1' UNION ALL
    SELECT 12, N'ColA1', N'ColB1' UNION ALL
    SELECT 13, N'ColA1', N'ColB1' UNION ALL
    SELECT 14, N'ColA1', N'ColB1'
    COMMIT;
    RAISERROR (N'[dbo].[DuplicatesTable]: Insert Batch: 1.....Done!', 10, 1) WITH NOWAIT;
    GO

    SET IDENTITY_INSERT [dbo].[DuplicatesTable] OFF; 
    -- END ===  D:\cas\sfw\sql\sql_dev\Install\DbName\6.Testing\4.DuplicatesRemoval\1.1..dbo..DuplicatesTable.TableInsert.sql

    . 
    -- START === D:\cas\sfw\sql\sql_dev\Install\DbName\6.Testing\4.DuplicatesRemoval\2.0.RemoveDuplicates.Script.sql
    SELECT 'SELECT * FROM DuplicatesTable ; '
    SELECT * FROM DuplicatesTable ; 

    Drop TABLE NonDuplicatesTable ; 

    SELECT ColA , ColB 
    into NonDuplicatesTable 
    from DuplicatesTable
    Group by  ColA , ColB 

    SELECT 'SELECT * FROM NonDuplicatesTable ; '
    SELECT * FROM NonDuplicatesTable ; 

    -- END ===  D:\cas\sfw\sql\sql_dev\Install\DbName\6.Testing\4.DuplicatesRemoval\2.0.RemoveDuplicates.Script.sql

    . 
    -- START === D:\cas\sfw\sql\sql_dev\Install\DbName\6.Testing\4.DuplicatesRemoval\all.sql

    . 
    -- START ===== D:\cas\sfw\sql\sql_dev\Install\DbName\6.Testing\4.DuplicatesRemoval\1.0..dbo..DuplicatesTable.TableCreate.sql
    /****** Object:  Table [dbo].[DuplicatesTable]    Script Date: 03/29/2010 21:24:02 ******/
    IF  EXISTS (SELECT * FROM sys.objects 
    WHERE object_id = OBJECT_ID(N'[dbo].[DuplicatesTable]') AND type in (N'U'))
    DROP TABLE [dbo].[DuplicatesTable]
    GO

    /****** Object:  Table [dbo].[DuplicatesTable]    Script Date: 03/29/2010 21:24:02 ******/
    SET ANSI_NULLS ON
    GO

    SET QUOTED_IDENTIFIER ON
    GO

    CREATE TABLE [dbo].[DuplicatesTable](
        [DuplicatesTableId] [bigint] IDENTITY(1,1) NOT NULL, --the PK  for DuplicatesTable
        [ColA] [varchar](10) NOT NULL, -- the name of the DuplicatesTable
        [ColB] [varchar](10) NULL,  -- the description of the e DuplicatesTable 
     CONSTRAINT [PK_DuplicatesTable] PRIMARY KEY CLUSTERED 
    (
        [DuplicatesTableId] ASC
    )WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, 
    ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
    ) ON [PRIMARY]



    /* 
    <doc> 
    Models a DuplicatesTable for 
    </doc>

    */


    GO


    --============================================================ DuplicatesTable START
    declare @ScriptFileName varchar(2000)
    SELECT @ScriptFileName = '$(ScriptFileName)'
    SELECT @ScriptFileName + ' --- DuplicatesTable START =========================================' 
    declare @TableName varchar(200)
    select @TableName = 'DuplicatesTable'

    SELECT 'SELECT name from sys.tables where name =''' + @TableName + ''''
    SELECT name from sys.tables 
    where name = @TableName

    DECLARE @TableCount INT 
    SELECT @TableCount  = COUNT(name ) from sys.tables 
        where name =@TableName

    if @TableCount=1
    SELECT ' DuplicatesTable PASSED. The Table ' + @TableName + ' EXISTS ' 
    ELSE 
    SELECT ' DuplicatesTable FAILED. The Table ' + @TableName + ' DOES NOT EXIST ' 
    SELECT @ScriptFileName + ' --- DuplicatesTable END =========================================' 
    --============================================================ DuplicatesTable END

    GO


    -- END ================== D:\cas\sfw\sql\sql_dev\Install\DbName\6.Testing\4.DuplicatesRemoval\1.0..dbo..DuplicatesTable.TableCreate.sql

    . 
    -- START ===== D:\cas\sfw\sql\sql_dev\Install\DbName\6.Testing\4.DuplicatesRemoval\1.1..dbo..DuplicatesTable.TableInsert.sql
    SET NOCOUNT ON;
    SET XACT_ABORT ON;
    GO

    SET IDENTITY_INSERT [dbo].[DuplicatesTable] ON;
    BEGIN TRANSACTION;
    INSERT INTO [dbo].[DuplicatesTable]([DuplicatesTableId], [ColA], [ColB])
    SELECT 1, N'ColA', N'ColB' UNION ALL
    SELECT 2, N'ColA', N'ColB' UNION ALL
    SELECT 3, N'ColA', N'ColB' UNION ALL
    SELECT 4, N'ColA', N'ColB' UNION ALL
    SELECT 5, N'ColA', N'ColB' UNION ALL
    SELECT 6, N'ColA', N'ColB' UNION ALL
    SELECT 7, N'ColA', N'ColB' UNION ALL
    SELECT 8, N'ColA1', N'ColB1' UNION ALL
    SELECT 9, N'ColA1', N'ColB1' UNION ALL
    SELECT 10, N'ColA1', N'ColB1' UNION ALL
    SELECT 11, N'ColA1', N'ColB1' UNION ALL
    SELECT 12, N'ColA1', N'ColB1' UNION ALL
    SELECT 13, N'ColA1', N'ColB1' UNION ALL
    SELECT 14, N'ColA1', N'ColB1'
    COMMIT;
    RAISERROR (N'[dbo].[DuplicatesTable]: Insert Batch: 1.....Done!', 10, 1) WITH NOWAIT;
    GO

    SET IDENTITY_INSERT [dbo].[DuplicatesTable] OFF; 
    -- END ================== D:\cas\sfw\sql\sql_dev\Install\DbName\6.Testing\4.DuplicatesRemoval\1.1..dbo..DuplicatesTable.TableInsert.sql

    . 
    -- START ===== D:\cas\sfw\sql\sql_dev\Install\DbName\6.Testing\4.DuplicatesRemoval\2.0.RemoveDuplicates.Script.sql
    SELECT 'SELECT * FROM DuplicatesTable ; '
    SELECT * FROM DuplicatesTable ; 

    Drop TABLE NonDuplicatesTable ; 

    SELECT ColA , ColB 
    into NonDuplicatesTable 
    from DuplicatesTable
    Group by  ColA , ColB 

    SELECT 'SELECT * FROM NonDuplicatesTable ; '
    SELECT * FROM NonDuplicatesTable ; 

    -- END ================== D:\cas\sfw\sql\sql_dev\Install\DbName\6.Testing\4.DuplicatesRemoval\2.0.RemoveDuplicates.Script.sql

    . 
    -- START ===== D:\cas\sfw\sql\sql_dev\Install\DbName\6.Testing\4.DuplicatesRemoval\all.txt

    . 
    -- START ===== D:\cas\sfw\sql\sql_dev\Install\DbName\6.Testing\4.DuplicatesRemoval\1.0..dbo..DuplicatesTable.TableCreate.sql
    /****** Object:  Table [dbo].[DuplicatesTable]    Script Date: 03/29/2010 21:24:02 ******/
    IF  EXISTS (SELECT * FROM sys.objects 
    WHERE object_id = OBJECT_ID(N'[dbo].[DuplicatesTable]') AND type in (N'U'))
    DROP TABLE [dbo].[DuplicatesTable]
    GO

    /****** Object:  Table [dbo].[DuplicatesTable]    Script Date: 03/29/2010 21:24:02 ******/
    SET ANSI_NULLS ON
    GO

    SET QUOTED_IDENTIFIER ON
    GO

    CREATE TABLE [dbo].[DuplicatesTable](
        [DuplicatesTableId] [bigint] IDENTITY(1,1) NOT NULL, --the PK  for DuplicatesTable
        [ColA] [varchar](10) NOT NULL, -- the name of the DuplicatesTable
        [ColB] [varchar](10) NULL,  -- the description of the e DuplicatesTable 
     CONSTRAINT [PK_DuplicatesTable] PRIMARY KEY CLUSTERED 
    (
        [DuplicatesTableId] ASC
    )WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, 
    ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
    ) ON [PRIMARY]



    /* 
    <doc> 
    Models a DuplicatesTable for 
    </doc>

    */


    GO


    --============================================================ DuplicatesTable START
    declare @ScriptFileName varchar(2000)
    SELECT @ScriptFileName = '$(ScriptFileName)'
    SELECT @ScriptFileName + ' --- DuplicatesTable START =========================================' 
    declare @TableName varchar(200)
    select @TableName = 'DuplicatesTable'

    SELECT 'SELECT name from sys.tables where name =''' + @TableName + ''''
    SELECT name from sys.tables 
    where name = @TableName

    DECLARE @TableCount INT 
    SELECT @TableCount  = COUNT(name ) from sys.tables 
        where name =@TableName

    if @TableCount=1
    SELECT ' DuplicatesTable PASSED. The Table ' + @TableName + ' EXISTS ' 
    ELSE 
    SELECT ' DuplicatesTable FAILED. The Table ' + @TableName + ' DOES NOT EXIST ' 
    SELECT @ScriptFileName + ' --- DuplicatesTable END =========================================' 
    --============================================================ DuplicatesTable END

    GO


    -- END ================== D:\cas\sfw\sql\sql_dev\Install\DbName\6.Testing\4.DuplicatesRemoval\1.0..dbo..DuplicatesTable.TableCreate.sql

    . 
    -- START ===== D:\cas\sfw\sql\sql_dev\Install\DbName\6.Testing\4.DuplicatesRemoval\1.1..dbo..DuplicatesTable.TableInsert.sql
    SET NOCOUNT ON;
    SET XACT_ABORT ON;
    GO

    SET IDENTITY_INSERT [dbo].[DuplicatesTable] ON;
    BEGIN TRANSACTION;
    INSERT INTO [dbo].[DuplicatesTable]([DuplicatesTableId], [ColA], [ColB])
    SELECT 1, N'ColA', N'ColB' UNION ALL
    SELECT 2, N'ColA', N'ColB' UNION ALL
    SELECT 3, N'ColA', N'ColB' UNION ALL
    SELECT 4, N'ColA', N'ColB' UNION ALL
    SELECT 5, N'ColA', N'ColB' UNION ALL
    SELECT 6, N'ColA', N'ColB' UNION ALL
    SELECT 7, N'ColA', N'ColB' UNION ALL
    SELECT 8, N'ColA1', N'ColB1' UNION ALL
    SELECT 9, N'ColA1', N'ColB1' UNION ALL
    SELECT 10, N'ColA1', N'ColB1' UNION ALL
    SELECT 11, N'ColA1', N'ColB1' UNION ALL
    SELECT 12, N'ColA1', N'ColB1' UNION ALL
    SELECT 13, N'ColA1', N'ColB1' UNION ALL
    SELECT 14, N'ColA1', N'ColB1'
    COMMIT;
    RAISERROR (N'[dbo].[DuplicatesTable]: Insert Batch: 1.....Done!', 10, 1) WITH NOWAIT;
    GO

    SET IDENTITY_INSERT [dbo].[DuplicatesTable] OFF; 
    -- END ================== D:\cas\sfw\sql\sql_dev\Install\DbName\6.Testing\4.DuplicatesRemoval\1.1..dbo..DuplicatesTable.TableInsert.sql

    . 
    -- START ===== D:\cas\sfw\sql\sql_dev\Install\DbName\6.Testing\4.DuplicatesRemoval\2.0.RemoveDuplicates.Script.sql
    SELECT 'SELECT * FROM DuplicatesTable ; '
    SELECT * FROM DuplicatesTable ; 

    Drop TABLE NonDuplicatesTable ; 

    SELECT ColA , ColB 
    into NonDuplicatesTable 
    from DuplicatesTable
    Group by  ColA , ColB 

    SELECT 'SELECT * FROM NonDuplicatesTable ; '
    SELECT * FROM NonDuplicatesTable ; 

    -- END ================== D:\cas\sfw\sql\sql_dev\Install\DbName\6.Testing\4.DuplicatesRemoval\2.0.RemoveDuplicates.Script.sql

    . 
    -- START ===== D:\cas\sfw\sql\sql_dev\Install\DbName\6.Testing\4.DuplicatesRemoval\all.txt

    . 
    -- START ===== D:\cas\sfw\sql\sql_dev\Install\DbName\6.Testing\4.DuplicatesRemoval\1.0..d 
    -- END ================== D:\cas\sfw\sql\sql_dev\Install\DbName\6.Testing\4.DuplicatesRemoval\all.txt

    . 
    -- START ===== D:\cas\sfw\sql\sql_dev\Install\DbName\6.Testing\4.DuplicatesRemoval\ConcatenateAll.bat
    ECHO EMPTY THE FILE 
    ECHO REMOVE THE OUTPUT FILE ALREADY ...
    ECHO. >all.txt

    ECHO FOR EACH LOG FILE IN THE CURRENT DIRECTORY DO CONTCATENATE IT 
    ECHO IN THE ALL.TXT FILE 
    for /f %%i in ('dir /s /b /a-d') do echo . >>all.txt&ECHO -- START ===== %%i>>all.txt&type %%i>>all.txt&ECHO. >>all.txt&echo -- END ================== %%i>>all.txt&ECHO. >>all.txt
    PAUSE
    ECHO OPEN THE ALL.TXT FILE 
    all.txt 
    -- END ================== D:\cas\sfw\sql\sql_dev\Install\DbName\6.Testing\4.DuplicatesRemoval\ConcatenateAll.bat


    -- END ===  D:\cas\sfw\sql\sql_dev\Install\DbName\6.Testing\4.DuplicatesRemoval\all.sql

    . 
    -- START === D:\cas\sfw\sql\sql_dev\Install\DbName\6.Testing\4.DuplicatesRemoval\all.txt

    . 
    -- START === D:\cas\sfw\sql\sql_dev\Install\DbName\6.Testing\4.DuplicatesRemoval\1.0..dbo..DuplicatesTable.TableCreate.sql
    /****** Object:  Table [dbo].[DuplicatesTable]    Script Date: 03/29/2010 21:24:02 ******/
    IF  EXISTS (SELECT * FROM sys.objects 
    WHERE object_id = OBJECT_ID(N'[dbo].[DuplicatesTable]') AND type in (N'U'))
    DROP TABLE [dbo].[DuplicatesTable]
    GO

    /****** Object:  Table [dbo].[DuplicatesTable]    Script Date: 03/29/2010 21:24:02 ******/
    SET ANSI_NULLS ON
    GO

    SET QUOTED_IDENTIFIER ON
    GO

    CREATE TABLE [dbo].[DuplicatesTable](
        [DuplicatesTableId] [bigint] IDENTITY(1,1) NOT NULL, --the PK  for DuplicatesTable
        [ColA] [varchar](10) NOT NULL, -- the name of the DuplicatesTable
        [ColB] [varchar](10) NULL,  -- the description of the e DuplicatesTable 
     CONSTRAINT [PK_DuplicatesTable] PRIMARY KEY CLUSTERED 
    (
        [DuplicatesTableId] ASC
    )WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, 
    ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
    ) ON [PRIMARY]



    /* 
    <doc> 
    Models a DuplicatesTable for 
    </doc>

    */


    GO


    --============================================================ DuplicatesTable START
    declare @ScriptFileName varchar(2000)
    SELECT @ScriptFileName = '$(ScriptFileName)'
    SELECT @ScriptFileName + ' --- DuplicatesTable START =========================================' 
    declare @TableName varchar(200)
    select @TableName = 'DuplicatesTable'

    SELECT 'SELECT name from sys.tables where name =''' + @TableName + ''''
    SELECT name from sys.tables 
    where name = @TableName

    DECLARE @TableCount INT 
    SELECT @TableCount  = COUNT(name ) from sys.tables 
        where name =@TableName

    if @TableCount=1
    SELECT ' DuplicatesTable PASSED. The Table ' + @TableName + ' EXISTS ' 
    ELSE 
    SELECT ' DuplicatesTable FAILED. The Table ' + @TableName + ' DOES NOT EXIST ' 
    SELECT @ScriptFileName + ' --- DuplicatesTable END =========================================' 
    --============================================================ DuplicatesTable END

    GO


    -- END ===  D:\cas\sfw\sql\sql_dev\Install\DbName\6.Testing\4.DuplicatesRemoval\1.0..dbo..DuplicatesTable.TableCreate.sql

    . 
    -- START === D:\cas\sfw\sql\sql_dev\Install\DbName\6.Testing\4.DuplicatesRemoval\1.1..dbo..DuplicatesTable.TableInsert.sql
    SET NOCOUNT ON;
    SET XACT_ABORT ON;
    GO

    SET IDENTITY_INSERT [dbo].[DuplicatesTable] ON;
    BEGIN TRANSACTION;
    INSERT INTO [dbo].[DuplicatesTable]([DuplicatesTableId], [ColA], [ColB])
    SELECT 1, N'ColA', N'ColB' UNION ALL
    SELECT 2, N'ColA', N'ColB' UNION ALL
    SELECT 3, N'ColA', N'ColB' UNION ALL
    SELECT 4, N'ColA', N'ColB' UNION ALL
    SELECT 5, N'ColA', N'ColB' UNION ALL
    SELECT 6, N'ColA', N'ColB' UNION ALL
    SELECT 7, N'ColA', N'ColB' UNION ALL
    SELECT 8, N'ColA1', N'ColB1' UNION ALL
    SELECT 9, N'ColA1', N'ColB1' UNION ALL
    SELECT 10, N'ColA1', N'ColB1' UNION ALL
    SELECT 11, N'ColA1', N'ColB1' UNION ALL
    SELECT 12, N'ColA1', N'ColB1' UNION ALL
    SELECT 13, N'ColA1', N'ColB1' UNION ALL
    SELECT 14, N'ColA1', N'ColB1'
    COMMIT;
    RAISERROR (N'[dbo].[DuplicatesTable]: Insert Batch: 1.....Done!', 10, 1) WITH NOWAIT;
    GO

    SET IDENTITY_INSERT [dbo].[DuplicatesTable] OFF; 
    -- END ===  D:\cas\sfw\sql\sql_dev\Install\DbName\6.Testing\4.DuplicatesRemoval\1.1..dbo..DuplicatesTable.TableInsert.sql

    . 
    -- START === D:\cas\sfw\sql\sql_dev\Install\DbName\6.Testing\4.DuplicatesRemoval\2.0.RemoveDuplicates.Script.sql
    SELECT 'SELECT * FROM DuplicatesTable ; '
    SELECT * FROM DuplicatesTable ; 

    Drop TABLE NonDuplicatesTable ; 

    SELECT ColA , ColB 
    into NonDuplicatesTable 
    from DuplicatesTable
    Group by  ColA , ColB 

    SELECT 'SELECT * FROM NonDuplicatesTable ; '
    SELECT * FROM NonDuplicatesTable ; 

    -- END ===  D:\cas\sfw\sql\sql_dev\Install\DbName\6.Testing\4.DuplicatesRemoval\2.0.RemoveDuplicates.Script.sql

    . 
    -- START === D:\cas\sfw\sql\sql_dev\Install\DbName\6.Testing\4.DuplicatesRemoval\all.sql

    . 
    -- START ===== D:\cas\sfw\sql\sql_dev\Install\DbName\6.Testing\4.DuplicatesRemoval\1.0..dbo..DuplicatesTable.TableCreate.sql
    /****** Object:  Table [dbo].[DuplicatesTable]    Script Date: 03/29/2010 21:24:02 ******/
    IF  EXISTS (SELECT * FROM sys.objects 
    WHERE object_id = OBJECT_ID(N'[dbo].[DuplicatesTable]') AND type in (N'U'))
    DROP TABLE [dbo].[DuplicatesTable]
    GO

    /****** Object:  Table [dbo].[DuplicatesTable]    Script Date: 03/29/2010 21:24:02 ******/
    SET ANSI_NULLS ON
    GO

    SET QUOTED_IDENTIFIER ON
    GO

    CREATE TABLE [dbo].[DuplicatesTable](
        [DuplicatesTableId] [bigint] IDENTITY(1,1) NOT NULL, --the PK  for DuplicatesTable
        [ColA] [varchar](10) NOT NULL, -- the name of the DuplicatesTable
        [ColB] [varchar](10) NULL,  -- the description of the e DuplicatesTable 
     CONSTRAINT [PK_DuplicatesTable] PRIMARY KEY CLUSTERED 
    (
        [DuplicatesTableId] ASC
    )WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, 
    ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
    ) ON [PRIMARY]



    /* 
    <doc> 
    Models a DuplicatesTable for 
    </doc>

    */


    GO


    --============================================================ DuplicatesTable START
    declare @ScriptFileName varchar(2000)
    SELECT @ScriptFileName = '$(ScriptFileName)'
    SELECT @ScriptFileName + ' --- DuplicatesTable START =========================================' 
    declare @TableName varchar(200)
    select @TableName = 'DuplicatesTable'

    SELECT 'SELECT name from sys.tables where name =''' + @TableName + ''''
    SELECT name from sys.tables 
    where name = @TableName

    DECLARE @TableCount INT 
    SELECT @TableCount  = COUNT(name ) from sys.tables 
        where name =@TableName

    if @TableCount=1
    SELECT ' DuplicatesTable PASSED. The Table ' + @TableName + ' EXISTS ' 
    ELSE 
    SELECT ' DuplicatesTable FAILED. The Table ' + @TableName + ' DOES NOT EXIST ' 
    SELECT @ScriptFileName + ' --- DuplicatesTable END =========================================' 
    --============================================================ DuplicatesTable END

    GO


    -- END ================== D:\cas\sfw\sql\sql_dev\Install\DbName\6.Testing\4.DuplicatesRemoval\1.0..dbo..DuplicatesTable.TableCreate.sql

    . 
    -- START ===== D:\cas\sfw\sql\sql_dev\Install\DbName\6.Testing\4.DuplicatesRemoval\1.1..dbo..DuplicatesTable.TableInsert.sql
    SET NOCOUNT ON;
    SET XACT_ABORT ON;
    GO

    SET IDENTITY_INSERT [dbo].[DuplicatesTable] ON;
    BEGIN TRANSACTION;
    INSERT INTO [dbo].[DuplicatesTable]([DuplicatesTableId], [ColA], [ColB])
    SELECT 1, N'ColA', N'ColB' UNION ALL
    SELECT 2, N'ColA', N'ColB' UNION ALL
    SELECT 3, N'ColA', N'ColB' UNION ALL
    SELECT 4, N'ColA', N'ColB' UNION ALL
    SELECT 5, N'ColA', N'ColB' UNION ALL
    SELECT 6, N'ColA', N'ColB' UNION ALL
    SELECT 7, N'ColA', N'ColB' UNION ALL
    SELECT 8, N'ColA1', N'ColB1' UNION ALL
    SELECT 9, N'ColA1', N'ColB1' UNION ALL
    SELECT 10, N'ColA1', N'ColB1' UNION ALL
    SELECT 11, N'ColA1', N'ColB1' UNION ALL
    SELECT 12, N'ColA1', N'ColB1' UNION ALL
    SELECT 13, N'ColA1', N'ColB1' UNION ALL
    SELECT 14, N'ColA1', N'ColB1'
    COMMIT;
    RAISERROR (N'[dbo].[DuplicatesTable]: Insert Batch: 1.....Done!', 10, 1) WITH NOWAIT;
    GO

    SET IDENTITY_INSERT [dbo].[DuplicatesTable] OFF; 
    -- END ================== D:\cas\sfw\sql\sql_dev\Install\DbName\6.Testing\4.DuplicatesRemoval\1.1..dbo..DuplicatesTable.TableInsert.sql

    . 
    -- START ===== D:\cas\sfw\sql\sql_dev\Install\DbName\6.Testing\4.DuplicatesRemoval\2.0.RemoveDuplicates.Script.sql
    SELECT 'SELECT * FROM DuplicatesTable ; '
    SELECT * FROM DuplicatesTable ; 

    Drop TABLE NonDuplicatesTable ; 

    SELECT ColA , ColB 
    into NonDuplicatesTable 
    from DuplicatesTable
    Group by  ColA , ColB 

    SELECT 'SELECT * FROM NonDuplicatesTable ; '
    SELECT * FROM NonDuplicatesTable ; 

    -- END ================== D:\cas\sfw\sql\sql_dev\Install\DbName\6.Testing\4.DuplicatesRemoval\2.0.RemoveDuplicates.Script.sql

    . 
    -- START ===== D:\cas\sfw\sql\sql_dev\Install\DbName\6.Testing\4.DuplicatesRemoval\all.txt

    . 
    -- START ===== D:\cas\sfw\sql\sql_dev\Install\DbName\6.Testing\4.DuplicatesRemoval\1.0..dbo..DuplicatesTable.TableCreate.sql
    /****** Object:  Table [dbo].[DuplicatesTable]    Script Date: 03/29/2010 21:24:02 ******/
    IF  EXISTS (SELECT * FROM sys.objects 
    WHERE object_id = OBJECT_ID(N'[dbo].[DuplicatesTable]') AND type in (N'U'))
    DROP TABLE [dbo].[DuplicatesTable]
    GO

    /****** Object:  Table [dbo].[DuplicatesTable]    Script Date: 03/29/2010 21:24:02 ******/
    SET ANSI_NULLS ON
    GO

    SET QUOTED_IDENTIFIER ON
    GO

    CREATE TABLE [dbo].[DuplicatesTable](
        [DuplicatesTableId] [bigint] IDENTITY(1,1) NOT NULL, --the PK  for DuplicatesTable
        [ColA] [varchar](10) NOT NULL, -- the name of the DuplicatesTable
        [ColB] [varchar](10) NULL,  -- the description of the e DuplicatesTable 
     CONSTRAINT [PK_DuplicatesTable] PRIMARY KEY CLUSTERED 
    (
        [DuplicatesTableId] ASC
    )WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, 
    ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
    ) ON [PRIMARY]



    /* 
    <doc> 
    Models a DuplicatesTable for 
    </doc>

    */


    GO


    --============================================================ DuplicatesTable START
    declare @ScriptFileName varchar(2000)
    SELECT @ScriptFileName = '$(ScriptFileName)'
    SELECT @ScriptFileName + ' --- DuplicatesTable START =========================================' 
    declare @TableName varchar(200)
    select @TableName = 'DuplicatesTable'

    SELECT 'SELECT name from sys.tables where name =''' + @TableName + ''''
    SELECT name from sys.tables 
    where name = @TableName

    DECLARE @TableCount INT 
    SELECT @TableCount  = COUNT(name ) from sys.tables 
        where name =@TableName

    if @TableCount=1
    SELECT ' DuplicatesTable PASSED. The Table ' + @TableName + ' EXISTS ' 
    ELSE 
    SELECT ' DuplicatesTable FAILED. The Table ' + @TableName + ' DOES NOT EXIST ' 
    SELECT @ScriptFileName + ' --- DuplicatesTable END =========================================' 
    --============================================================ DuplicatesTable END

    GO


    -- END ================== D:\cas\sfw\sql\sql_dev\Install\DbName\6.Testing\4.DuplicatesRemoval\1.0..dbo..DuplicatesTable.TableCreate.sql

    . 
    -- START ===== D:\cas\sfw\sql\sql_dev\Install\DbName\6.Testing\4.DuplicatesRemoval\1.1..dbo..DuplicatesTable.TableInsert.sql
    SET NOCOUNT ON;
    SET XACT_ABORT ON;
    GO

    SET IDENTITY_INSERT [dbo].[DuplicatesTable] ON;
    BEGIN TRANSACTION;
    INSERT INTO [dbo].[DuplicatesTable]([DuplicatesTableId], [ColA], [ColB])
    SELECT 1, N'ColA', N'ColB' UNION ALL
    SELECT 2, N'ColA', N'ColB' UNION ALL
    SELECT 3, N'ColA', N'ColB' UNION ALL
    SELECT 4, N'ColA', N'ColB' UNION ALL
    SELECT 5, N'ColA', N'ColB' UNION ALL
    SELECT 6, N'ColA', N'ColB' UNION ALL
    SELECT 7, N'ColA', N'ColB' UNION ALL
    SELECT 8, N'ColA1', N'ColB1' UNION ALL
    SELECT 9, N'ColA1', N'ColB1' UNION ALL
    SELECT 10, N'ColA1', N'ColB1' UNION ALL
    SELECT 11, N'ColA1', N'ColB1' UNION ALL
    SELECT 12, N'ColA1', N'ColB1' UNION ALL
    SELECT 13, N'ColA1', N'ColB1' UNION ALL
    SELECT 14, N'ColA1', N'ColB1'
    COMMIT;
    RAISERROR (N'[dbo].[DuplicatesTable]: Insert Batch: 1.....Done!', 10, 1) WITH NOWAIT;
    GO

    SET IDENTITY_INSERT [dbo].[DuplicatesTable] OFF; 
    -- END ================== D:\cas\sfw\sql\sql_dev\Install\DbName\6.Testing\4.DuplicatesRemoval\1.1..dbo..DuplicatesTable.TableInsert.sql

    . 
    -- START ===== D:\cas\sfw\sql\sql_dev\Install\DbName\6.Testing\4.DuplicatesRemoval\2.0.RemoveDuplicates.Script.sql
    SELECT 'SELECT * FROM DuplicatesTable ; '
    SELECT * FROM DuplicatesTable ; 

    Drop TABLE NonDuplicatesTable ; 

    SELECT ColA , ColB 
    into NonDuplicatesTable 
    from DuplicatesTable
    Group by  ColA , ColB 

    SELECT 'SELECT * FROM NonDuplicatesTable ; '
    SELECT * FROM NonDuplicatesTable ; 

    -- END ================== D:\cas\sfw\sql\sql_dev\Install\DbName\6.Testing\4.DuplicatesRemoval\2.0.RemoveDuplicates.Script.sql

    . 
    -- START ===== D:\cas\sfw\sql\sql_dev\Install\DbName\6.Testing\4.DuplicatesRemoval\all.txt

    . 
    -- START ===== D:\cas\sfw\sql\sql_dev\Install\DbName\6.Testing\4.DuplicatesRemoval\1.0..d 
    -- END ================== D:\cas\sfw\sql\sql_dev\Install\DbName\6.Testing\4.DuplicatesRemoval\all.txt

    . 
    -- START ===== D:\cas\sfw\sql\sql_dev\Install\DbName\6.Testing\4.DuplicatesRemoval\ConcatenateAll.bat
    ECHO EMPTY THE FILE 
    ECHO REMOVE THE OUTPUT FILE ALREADY ...
    ECHO. >all.txt

    ECHO FOR EACH LOG FILE IN THE CURRENT DIRECTORY DO CONTCATENATE IT 
    ECHO IN THE ALL.TXT FILE 
    for /f %%i in ('dir /s /b /a-d') do echo . >>all.txt&ECHO -- START ===== %%i>>all.txt&type %%i>>all.txt&ECHO. >>all.txt&echo -- END ================== %%i>>all.txt&ECHO. >>all.txt
    PAUSE
    ECHO OPEN THE ALL.TXT FILE 
    all.txt 
    -- END ================== D:\cas\sfw\sql\sql_dev\Install\DbName\6.Testing\4.DuplicatesRemoval\ConcatenateAll.bat


    -- END ===  D:\cas\sfw\sql\sql_dev\Install\DbName\6.Testing\4.DuplicatesRemoval\all.sql

    . 
    -- START === D:\cas\sfw\sql\sql_dev\Install\DbName\6.Testing\4.DuplicatesRemoval\all.txt

    . 
    -- START === D:\cas\sfw\sql\sql_dev\Install\DbName\6.Testing\4.DuplicatesRemoval\1.0..dbo..DuplicatesTable.TableCreate.sql
    /****** Object:  Table [dbo].[DuplicatesTable]    Script Date: 03/29/2010 21:24:02 ******/
    IF  EXISTS (SELECT * FROM sys.objects 
    WHERE object_id = OBJECT_ID(N'[dbo].[DuplicatesTable]') AND type in (N'U'))

    -- END ===  D:\cas\sfw\sql\sql_dev\Install\DbName\6.Testing\4.DuplicatesRemoval\all.txt

    . 
    -- START === D:\cas\sfw\sql\sql_dev\Install\DbName\6.Testing\4.DuplicatesRemoval\ConcatenateAll.bat
    ECHO EMPTY THE FILE 
    ECHO REMOVE THE OUTPUT FILE ALREADY ...
    ECHO. >all.txt

    ECHO FOR EACH LOG FILE IN THE CURRENT DIRECTORY DO CONTCATENATE IT 
    ECHO IN THE ALL.TXT FILE 
    for /f %%i in ('dir /s /b /a-d') do echo . >>all.txt&ECHO -- START === %%i>>all.txt&type %%i>>all.txt&ECHO. >>all.txt&echo -- END ===  %%i>>all.txt&ECHO. >>all.txt
    PAUSE
    ECHO OPEN THE ALL.TXT FILE 
    all.txt 
    -- END ===  D:\cas\sfw\sql\sql_dev\Install\DbName\6.Testing\4.DuplicatesRemoval\ConcatenateAll.bat
YordanGeorgiev
is.. this a script to remove dupes? seems a little long?
Ian
@Yordan - What is this exactly? Maybe some explanatory text would be good!
Martin Smith
+4  A: 

Answer

;With cte As
(
SELECT ROW_NUMBER() OVER (PARTITION BY Col1, Col2, Col3 ORDER BY (SELECT 0)) RN
FROM MyTable
)
DELETE FROM cte WHERE RN<> 1

Execution Plans

The execution plan for this is simpler and more efficient than that in the accepted answer as it does not require the self join.

Execution Plans

Test Script

CREATE TABLE #MyTable
(
RowID int not null identity(1,1) primary key,
Col1 varchar(20) not null,
Col2 varchar(2048) not null,
Col3 tinyint not null
) 

INSERT INTO #MyTable (Col1, Col2, Col3)
SELECT 'aaa', 'aaa', 10 UNION ALL
SELECT 'aaa', 'aaa', 10 UNION ALL
SELECT 'bbb', 'bbb', 20 UNION ALL
SELECT 'aaa', 'aaa', 10 

;With cte As
(
SELECT ROW_NUMBER() OVER (PARTITION BY Col1, Col2, Col3 ORDER BY (SELECT 0)) RN
FROM #MyTable
)
DELETE FROM cte WHERE RN<> 1

SELECT * FROM #MyTable

DROP TABLE #MyTable
Martin Smith
I don't understand why this answer has zero votes. It is much clearer than the accepted answer.
Antoine Aubry
A: 
delete t1
from table t1, table t2
where t1.columnA = t2.columnB
and t1.rowid>t2.rowid
SoftwareGeek