views:

160

answers:

3

This shows me all the first names and last names that have exactly two entries that are identical

SELECT `firstname`,`lastname`,COUNT(*) AS Count 
FROM `people` 
GROUP BY `firstname`,`lastname`
HAVING Count = 2

How do I turn this into a DELETE FROM WHERE statement with a LIMIT to only remove one of each of the entries and leave the other one.

okay this appears to be way to technical i'm just going to do it in a php while loop

+1  A: 

if you have a primary key, such as id, you can do:

delete from people 
where id not in
(
      select minid from 
      (select min(id) as minid from people 
      group by firstname, lastname) as newtable
)

The subquery select min(id)... bit is getting you the unique (based on id) rows for a given firstname, lastname combination; and then you're deleting all other rows, i.e. your duplicates. You need to wrap your subquery due to a bug in mysql, otherwise we could do:

delete from people 
where id not in
(
      select min(id) as minid from people 
      group by firstname, lastname
)

better would be:

delete people from 
people left outer join
(
  select min(id) as minid from people 
  group by firstname, lastname
) people_grouped
on people.first_name = people_grouped.first_name
and people.last_name = people_grouped.last_name
and people_grouped.id is null

to avoid the subquery.

davek
can you explain this code
Dasa
"you need to wrap your subquery due to a bug in mysql": When you do delete with a select from the same table, the table should be locked during the query and this isn't implemented yet in MySQL. See http://dev.mysql.com/doc/refman/5.0/en/delete.html: `Currently, you cannot delete from a table and select from the same table in a subquery.` Since MySQL knows about the risk, it prevents you from doing that type of query. What you're doing causes MySQL to not notice the problem, but the problem is still there. Having said that, it will probably be OK if there are no other simultaneous users.
Mark Byers
+2  A: 

You can create a table with 1 record of each of the duplicates: Then delete all the dup records from the people table and then re-insert the dup records.

-- Setup for example
create table people (fname varchar(10), lname varchar(10));

insert into people values ('Bob', 'Newhart');
insert into people values ('Bob', 'Newhart');
insert into people values ('Bill', 'Cosby');
insert into people values ('Jim', 'Gaffigan');
insert into people values ('Jim', 'Gaffigan');
insert into people values ('Adam', 'Sandler');

-- Show table with duplicates
select * from people;

-- Create table with one version of each duplicate record
create table dups as 
    select distinct fname, lname, count(*) 
    from people group by fname, lname 
    having count(*) > 1;

-- Delete all matching duplicate records
delete people from people inner join dups 
on people.fname = dups.fname AND 
   people.lname = dups.lname;

-- Insert single record of each dup back into table
insert into people select fname, lname from dups;

-- Show Fixed table
select * from people;
RC
is there no easier way?
Dasa
If you don't have any other field that makes the record unique (like an id) which others have used in their answer, then this is a pretty straightforward way when you have truly identical rows and you don't have to copy the entire table.
RC
distinct is a costlier operation as it requires sorting of the table to remove duplicate rows. so beware of using distinct.
Damodharan R
A: 

Create a new table and add a unique key on (firstname,lastname). Then insert the rows in old table into the new table. Then rename the tables.

mysql> select * from t;
+-----------+----------+
| firstname | lastname |
+-----------+----------+
| A         | B        | 
| A         | B        | 
| X         | Y        | 
+-----------+----------+
3 rows in set (0.00 sec)

mysql> create table t2 like t;
Query OK, 0 rows affected (0.00 sec)

mysql> alter table t2 add unique key name(firstname,lastname);
Query OK, 0 rows affected (0.00 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql> insert ignore into t2 select * from t;
Query OK, 2 rows affected (0.00 sec)
Records: 3  Duplicates: 1  Warnings: 0


mysql> select * from t2;
+-----------+----------+
| firstname | lastname |
+-----------+----------+
| A         | B        | 
| X         | Y        | 
+-----------+----------+
2 rows in set (0.01 sec)
Damodharan R
A few potential issues with this suggestion: If there are constraints betwen tables, you'll have to disable those first. Even if there are very few rows that need deleting, this method will still require copying (almost) the entire table. Finally, there may be other columns too that weren't mentioned in the question, and that might further increase the amount of data that needs copying.
Mark Byers
Foreign key constraints may be an issue. Regarding the copying I feel this should be faster than joining the table even though it is copying the entire table. If the table is large joining may be more expensive than this. I took only those two fields to demo it.
Damodharan R
This looks much neater solution than mine.http://www.justin-cook.com/wp/2006/12/12/remove-duplicate-entries-rows-a-mysql-database-table/
Damodharan R