ansaurus

Question

removing duplicates from table without using temporary table

Answer 1

+2 A:

Can you use the row_number() function (http://msdn.microsoft.com/en-us/library/ms186734.aspx) to partition by the columns you're looking for dupes on, and delete where row number isn't 1?

brydgesk 2010-05-09 06:13:43

Doesn't work - if you do that, you only have the 'A', 'B' and 'C' values to work with - when you delete all 'B' with row_number > 1, it will delete **all** instances of 'B' since there's no way to distinguish them if there are no other columns in the data set...

marc_s 2010-05-09 07:17:52

"The world is moving so fast these days that the man who says it can't be done is generally interrupted by someone doing it." brydgesk could have been the accepted answer if he was not discouraged to materialize his idea. brydgesk could have tried doing it

Hao 2010-05-10 01:20:16

Nah, I wasn't discouraged. I just didn't feel this was a situation where the OP needed to be given code. All he needed was a concept. I've used row_number() for this exact purpose so I knew it was usable. Thanks for the support Hao.

brydgesk 2010-05-10 05:21:56

Answer 2

A:

I completely agree that having a unique identifier will save you a lot of time.

But if you can't use one (or if this is purely hypothetical), here's an alternative: Determine the number of rows to delete (the count of each distinct value -1), then loop through and delete top X for each distinct value.

Note that I'm not responsible for the number of kittens that are killed every time you use dynamic SQL.

declare @name varchar(50)
declare @sql varchar(max)
declare @numberToDelete varchar(10) 
declare List cursor for
    select name, COUNT(name)-1 from #names group by name
OPEN List
FETCH NEXT FROM List 
INTO @name,@numberToDelete
WHILE @@FETCH_STATUS = 0
BEGIN
  IF @numberToDelete > 0
  BEGIN
    set @sql = 'delete top(' + @numberToDelete + ') from #names where name=''' + @name + ''''
    print @sql
    exec(@sql)
  END
  FETCH NEXT FROM List INTO @name,@numberToDelete
END
CLOSE List
DEALLOCATE List

Another alternative would to be create a view with a generated identity. In this way you could map the values to a unique identifer (allowing for conventional delete) without making a permanent addition to your table.

seraphym 2010-05-09 07:24:43

Answer 3

A:

Select grouped data to temp table, then truncate original, after that move back it to original.

Second solution, I am not sure will it work but you can try open table directly from SQL Management Studio and use CTRL + DEL on selected rows to delete them. That is going to be extremely slowly because you need to delete every single row by hands.

adopilot 2010-05-09 07:57:37

Answer 4

+3 A:

WITH TableWithKey AS (
SELECT ROW_NUMBER() OVER (ORDER BY Col1) As id, Col1 As val
FROM TableA
)
DELETE FROM TableWithKey WHERE id NOT IN
(
SELECT MIN(id) FROM TableWithKey
GROUP BY val
)

ewwwyn 2010-05-09 08:10:51

This is a CTE, not a temp table, if that's acceptable?

ewwwyn 2010-05-09 08:23:04

+1 interesting approach! And it works indeed. But still - why anyone would want to do this to himself is beyond me.... :-)

marc_s 2010-05-09 08:29:04

Answer 5

A:

You can remove duplicate rows using a cursor and DELETE .. WHERE CURRENT OF.

CREATE TABLE Client ([name] varchar(100))
INSERT Client VALUES('Bob')
INSERT Client VALUES('Alice')
INSERT Client VALUES('Bob')
GO
DECLARE @history TABLE (name varchar(100) not null)
DECLARE @cursor CURSOR, @name varchar(100)
SET @cursor = CURSOR FOR SELECT name FROM Client
OPEN @cursor
FETCH NEXT FROM @cursor INTO @name
WHILE @@FETCH_STATUS = 0
BEGIN
    IF @name IN (SELECT name FROM @history)
        DELETE Client WHERE CURRENT OF @cursor
    ELSE
        INSERT @history VALUES (@name)

    FETCH NEXT FROM @cursor INTO @name
END

Anthony Faull 2010-05-09 10:08:23

ansaurus

tags:

views:

answers:

removing duplicates from table without using temporary table

related questions