ansaurus

Question

Trying to delete duplicate entries in SQL database deleted all the records. What went wrong?

Answer 1

+2 A:

You should do the delete with a subselect, not a join.

The benefit of doing it this way, is you can preview the GUID's you will delete before you actually delete them. (just run the select query by it self)

This outta do it, it will delete the smallest GUID

 delete from emailTable where GUID in
 (

  select MIN(dupe.GIUD) from emailTable dupe
    INNER JOIN emailTable noDupe 
   ON dupe.emlPath=noDupe.emlPath 
   where recievedOn between '2009-8-18' and '2009-8-20'
               GROUP BY dupe.emlPath
 )

Byron Whitlock 2009-08-26 19:27:40

Answer 2

+1 A:

What you did wrong is that your query doesn't exclude any of the duplicates. It picks out the duplicates that are different from another duplicate with the same path, but every duplicate is different from another duplicate.

What you have to do is to first pick out the duplicates that you want to keep, for example:

select min(GUID)
from emailTable
where ReceivedOn > '...' and ReceivedOn < '...'
group by emlPath
having count(*) > 1

Then you delete all duplicates except those.

Guffa 2009-08-26 19:35:18

OMG Ponies 2009-08-26 19:39:16

@rexem, beware of having a tie in the values being ranked. Use ROW_NUMBER() instead.

Jeff O 2009-08-26 20:03:41

Answer 3

+4 A:

You can do this without a second table. Something like this:

SELECT * FROM emailTable
WHERE EXISTS (
    SELECT * FROM emailTable AS t2
    WHERE t2.emlPath = emailTable.emlPath AND
    t2.GUID > emailTable.GUID)

That will show you which records are about to get deleted. If that's okay, change it to:

DELETE FROM emailTable
WHERE EXISTS (
    SELECT * FROM emailTable AS t2
    WHERE t2.emlPath = emailTable.emlPath AND
    t2.GUID > emailTable.GUID)

The t2.GUID > emailTable.GUID will make sure that one record with that emlPath will remain in the table.

Thorarin 2009-08-26 19:36:33

Answer 4

A:

You should not use "=" in your join. ie "AND NOT (dupes.GUID = fullTable.GUID)" This condition won't do anything since the GUID of your duplicate rows must be different.

You should use greater than. ie

delete from emailTable 
WHERE EXISTS
(
    SELECT ID FROM emailTable t2
    WHERE emailTable.GUID > t2.GUID
    AND emailTable.emlPath= t2.emlPath
)

David 2009-08-26 19:38:35

Answer 5

A:

I prefer to use a common table expression for this and ROW_NUMBER():

with cte as (
   select row_number() over (partition by emlPath order by GUID) as eml_no
      , ReceivedOn
   from [emailTables])
delete from cte
   where eml_no > 1
   and ReceivedOn between '2009-08-18 23:59:59.999' AND '2009-08-20 00:00:00.000';

I preffer this because it gives stirct control over which duplicate row is deleted. I can delete the third one and keep two, I can choose whatever order number I want to keep the first one, and it deals fine with ties.

Remus Rusanu 2009-08-26 20:24:13

Answer 6

A:

This was the code that I ended up at thanks to the help of all the posts:

DELETE A
  FROM [emailTable] A, [emailTable] B
  WHERE A.MessageID = B.MessageID
        AND A.GUID > B.GUID

swolff1978 2009-08-27 14:01:47

Answer 7

A:

i want to delete duplicate record but i have no primary key . just as example i have three columns (id ,name,address) and have three same record in a table such as(1,Bob,London).I want to delete all duplicate record but remains one record in a table.

Navkul 2009-12-27 20:22:01

ansaurus

tags:

views:

answers:

Trying to delete duplicate entries in SQL database deleted all the records. What went wrong?

related questions