tags:

views:

175

answers:

9

I want to run the following sql command:

ALTER TABLE `my_table` ADD UNIQUE (
`ref_id` ,
`type`
);

The problem is that some of the data in the table would make this invalid, therefore altering the table fails.

Is there a clever way in MySQL to delete the duplicate rows?

+4  A: 

SQL can, at best, handle this arbitrarily. To put it another way: this is your problem.

You have data that currently isn't unique. You want to make it unique. You need to decide how to handle the duplicates.

There are a variety of ways of handling this:

  • Modifying or deleting duplicate rows by hand if the numbers are sufficiently small;
  • Running statements to update or delete duplicate that meet certain criteria to get to a point where the exceptions can be dealt with on an individual basis;
  • Copying the data to a temporary table, emptying the original and using queries to repopulate the table; and
  • so on.

Note: these all require user intervention.

You could of course just copy the table to a temporary table, empty the original and copy in the rows just ignoring those that fail but I expect that won't give you the results that you really want.

cletus
A: 

You will have to find some other field that is unique because deleting on ref_id and type alone will delete them all.

To get the duplicates:

select ref_id, type from my_table group by ref_id, type having count(*)>1

Xarpb has some clever tricks (maybe too clever): http://www.xaprb.com/blog/2007/02/06/how-to-delete-duplicate-rows-with-sql-part-2/

Jonas Elfström
are you sure about this one? it looks like it might delete all of duplicates (instead of leaving 1)
SpliFF
A select never deletes anything... I guess I worth the downvote anyway.
Jonas Elfström
A: 

What you can do is add a temporary identity column to your table. With that you can write query to identify and delete the duplicates (you can modify the query little bit to make sure only one copy from the set of duplicate rows are retained).

Once this is done, drop the temporary column and add unique constraint to your original column.

Hope this helps.

Ajay Singh
A: 

What I've done in the past is export the unique set of data, drop the table, recreate it with the unique columns and import the data.

It is often faster than trying to figure out how to delete the duplicate data.

Charlie
A: 

The are some suggestions on how to delete duplicate rows in table in the answers to this question.

Dave Webb
A: 

There is a good KB article that provides a step-by-step approach to finding and removing rows that have duplicate values. It provides two approaches - a one-off approach for finding and removing a single row and a broader solution to solving this when many rows are involved.

http://support.microsoft.com/kb/139444

Bubbafat
-1 because this is a MySQL question
Raj More
The procedure itself is not SQL Server specific.
pompo
+2  A: 

if you don't care which row gets deleted, use IGNORE:

ALTER IGNORE TABLE `my_table` ADD UNIQUE (
`ref_id` ,
`type`
);
longneck
Thanks this has worked nicely, I was unaware of the IGNORE key word
Lizard
A: 

Here is a snippet I used to delete duplicate rows in one of the tables

BEGIN TRANSACTION
Select *,
rank() over (Partition by PolicyId, PlanSeqNum, BaseProductSeqNum, 
     CoInsrTypeCd, SupplierTypeSeqNum
      order by CoInsrAmt desc) as  MyRank
into #tmpTable
from PlanCoInsr 

select distinct PolicyId,PlanSeqNum,BaseProductSeqNum,
     SupplierTypeSeqNum, CoInsrTypeCd, CoInsrAmt 
into #tmpTable2
from #tmpTable where MyRank=1

truncate table PlanCoInsr

insert into PlanCoInsr
    select * from #tmpTable2

drop table #tmpTable
drop table #tmpTable2

COMMIT
epitka
A: 

This worked for me:

ALTER TABLE table_name ADD UNIQUE KEY field_name (field_name)
Derrick