ansaurus

Question

Answer 1

+1 A:

In MySQL:

CREATE TABLE `new_table` LIKE `table1`;
INSERT INTO `new_table` ( SELECT * FROM `table1` GROUP BY field1 );
DROP TABLE `table1`;
RENAME TABLE `new_table` TO `table1`;

This won't exactly choose a "random" duplicate row, but it may accomplish what you desire if you don't care about that.

If you have more fields that need to be unique in combination with the rest, add them to the GROUP BY clause.

EDIT: Reverted to old answer

Fragsworth 2009-09-12 23:34:59

@Fragsworth: Woops, #1093 - You can't specify target table 'table1' for update in FROM clause

Alex 2009-09-13 04:38:58

Answer 2

+1 A:

Working off Fragsworth's answer, I'd:

Create a new table: NEW_TABLE
Define the field1 as the primary key
Insert rows into NEW_TABLE from the old table
Drop the old table
Rename NEW_TABLE to whatever the old table was called

The primary key would stop rows with the same field1 value from being inserted, and be better overall for later queries.

OMG Ponies 2009-09-12 23:40:44

Answer 3

A:

This should do it (untested, in SQL Server):

SELECT field1, field2
INTO #temp
FROM 
   (SELECT ROW_NUMBER() OVER (PARTITION BY field1 ORDER BY NEWID()) AS __ROW, *
    FROM table1) x
WHERE x.__ROW = 1;

DELETE table1;

INSERT table1 
SELECT field1, field2
FROM #temp;

Dave Markle 2009-09-12 23:55:18

The OP updated to say this needs is for mySQL

OMG Ponies 2009-09-13 00:19:21

Answer 4

A:

Make a new table OR add a unique key, self join, and delete all but the minimum key

New table:

So you could make a new table without dups. I imagine you thought of this already.

 CREATE TABLE new_test (field1 INTEGER, field2 INTEGER);
    INSERT INTO new_test(field1,field2) SELECT DISTINCT field1,field2 FROM test;
    DROP TABLE test;
    RENAME TABLE new_test test;

If you had a unique key, you could do a self join and identify the targets by having a unique key > than the minimum. If you didn't have such a key, you could make one:

Make unique key:

ALTER TABLE t2 ADD COLUMN (pk INTEGER NOT NULL AUTO_INCREMENT, PRIMARY KEY(pk));

Anyway, now you can do a self join and keep MIN(pk):

Self-join and delete dups:

mysql> DELETE dups.* FROM t2 AS dups
           INNER JOIN (
               SELECT field1,field2,MIN(pk) as MPK FROM t2
               GROUP BY field1,field2 HAVING COUNT(*) > 1 ) AS keep
           ON keep.field1=dups.field1
              AND keep.field2=dups.field2
              AND keep.MPK <> dups.pk;

DigitalRoss 2009-09-13 01:16:24

-1 for the eye-hurting font

Andomar 2009-09-13 09:21:33

Ok, made it smaller

DigitalRoss 2009-09-13 16:41:36

Answer 5

+3 A:

The simplest way is to make use of the MySQL-specific ALTER IGNORE command. It is unintuitive to delete rows by creating an index, but works very well. The IGNORE keyword means that when you create an index, any duplicate rows will be deleted. And, leaving the index in place that we create below will prevent any future duplicates. If you do not wish this behaviour, just drop the index after creating it.

ALTER IGNORE TABLE table1 ADD UNIQUE INDEX indexname (field1, field2)

RedFilter 2009-09-13 02:45:21

+1 Read this only after posting the exact same thing:)

Andomar 2009-09-13 09:20:57

Answer 6

A:

You can use MYSQL's ALTER IGNORE syntax for that. The following command will remove any duplicates, and leave a random row:

alter ignore table table1 add unique index index1 (field1);

It would be wise to keep the index in place, so new duplicates cannot be added. But if you'd like, you can remove the index with:

alter table table1 drop index index1;

Andomar 2009-09-13 09:17:11

ansaurus

tags:

views:

answers:

Cleaning up identical rows with SQL

Make a new table OR add a unique key, self join, and delete all but the minimum key

New table:

Make unique key:

Self-join and delete dups:

related questions