views:

507

answers:

13

Is there anyway I can erase all the duplicate entries from a certain table (users)? Here is a sample of the type of entries I have. I must say the table users consists of 3 fields, ID, user, and pass.

mysql_query("DELETE FROM users WHERE ???") or die(mysql_error());

randomtest
randomtest
randomtest
nextfile
baby
randomtest
dog
anothertest
randomtest
baby
nextfile
dog
anothertest
randomtest
randomtest

I want to be able to find the duplicate entries, and then delete all of the duplicates, and leave one.

A: 

A very simple solution would be to set an UNIQUE index on the table's column you wish to have unique values. Note that you subsequently cannot insert the same key twice.

Edit: My mistake, I hadn't read that last line: "I want to be able to find the duplicate entries".

Paul Lammertsma
But this will now help him to get rid of existing duplicate values. Actually he will not be able to create such index if the table already contains duplicate values.
Tomas
I know this, but wouldn't this output a error message if one tries to input duplicate data? Let's say I tried to upload a list of usernames. Duplicates lurk the list left and right. What would happen?
Homework
@Thomas Exactly!
Homework
I think that in general, having a unique index is fine if you are sure that you never want to have duplicate values, but this should always be used as a 'last resort check'. It is usually wise to do some programmatic check for the new vale duplicity before trying to insert it into the table to avoid you sql queries to be aborted with error (which is usually difficult to handle properly according to error message).
Tomas
A: 

I would get all the results, put them in an array of IDs and VALUES. Use a PHP function to work out the dupes, log all the IDs in an array, and use those values to delete the records.

dotty
Access granted. Although the guy answered the question, his way of deleting duplicates is much easier. http://stackoverflow.com/questions/1728048/removing-duplicate-field-entries-in-sql/1728724#1728724
Homework
A: 

I don't know your db schema, but the simplest solution seems to be to do SELECT DISTINCT on that table, keep the result in a variable (i.e. array), delete all records from the table and then reinsert the list returne by SELECT DISTINCT previously.

Tomas
You won't get the ID column using a select distinct on the username. If you include the ID column in the select you would get all row again.
badbod99
+4  A: 

You can do it with three sqls:

create table tmp as select distinct name from users;
drop table users;
alter table tmp rename users;
fritzone
Sounds as if this would actually work, testing now.
Homework
Won't work if Foreign Keys target this table though (have to Drop and Recreate those FKs)
Kristen
I'm just wondering, can foreign keys work in a situation with duplicate entries? (not being ironical, simply asking since I don't know ...)
fritzone
This won't work as there are 3 columns in the table...
ck
@fritzone - Foreign keys must reference a unique key. If the table has a synthetic primary key that that can be used by a foreign key. That's why synthetic primary keys must always be supported by unique constraints on the business key columns. In the case given, I suspect ID is the primary key, and hence unique, but the business key NAME has not been protected by a unique constraint.
APC
You will also loose your ID column!
badbod99
@APC - thanks, maybe I've over simplified the problem thinking there is only one column in Joey's table... :D
fritzone
Doesn't work at all. Thought it would.
Homework
A: 

If you have a Unique ID / Primary key on the table then:

DELETE FROM MyTable AS T1
WHERE MyID <
(
    SELECT MAX(MyID)
    FROM MyTable AS T2
    WHERE     T2.Col1 = T1.Col1
          AND T2.Col2 = T1.Col2
          ... repeat for all columns to consider duplicates ...
)

if you don't have a Unique Key select all distinct values into a temporary table, delete all original rows, and copy back from temporary table - but this will be problematic if you have Foreign Keys referring to this table

Kristen
P.S. And then add a UNIQUE INDEX to that column :)
Kristen
Be careful here, though. Make sure you add another where clause to uniquely identify the username you want to delete in the outer query, otherwise you will end up with a table containing just on username. :)
ZombieSheep
This doesn't work since in MySQL you are not allowed to have the sub-select's target be the same as delete's target. Run it and you'll see.
Emil Ivanov
Sorry, I don't have MySQL
Kristen
As the accepted answer works does MySQL allow a sub-select on the delete-table so long as it is in a JOIN, just not for a correlated sub-select?
Kristen
@ZombieSheep the inner query will take care of that (provided UserName is included in the WHERE) although I now appreciate that this doesn't work in MySQL
Kristen
A: 

The temporary table is an excellent solution, but I'd like to provide a SELECT query that grabs duplicate rows from the table as an alternative:

SELECT * FROM `users` LEFT JOIN (
        SELECT `name`, COUNT(`name`) AS `count`
        FROM `users` GROUP BY `name`
    ) AS `grouped`
    WHERE `grouped`.`name` = `users`.`name`
    AND `grouped`.`count`>1
Paul Lammertsma
+1  A: 

This delete script (SQL Server syntax) should work:

DELETE FROM Users
WHERE ID NOT IN (
    SELECT MIN(ID)
    FROM Users
    GROUP BY User
)
ck
This doesn't work since in MySQL you are not allowed to have the sub-select's target be the same as delete's target. Run it and you'll see.
Emil Ivanov
Doesn't work at all.
Homework
@Emil: I only have SQL Server to hand..
ck
A: 

Sorry, i can't comment answers. I think it'll be better write so:

create table tmp like users;
insert into tmp select distinct name from users;
drop table users;
alter table tmp rename users;
Dmitry Merkushin
A: 

Select your 3 columns as per your table structure and apply condition as per your requirements.

SELECT user.userId,user.username user.password FROM user As user GROUP BY user.userId, user.username HAVING (COUNT(user.username) > 1));

RIDDHI
+1  A: 

I assume that you have a structure like the following:

users
-----------------
| id | username |
-----------------
|  1 | joe      |
|  2 | bob      |
|  3 | jane     |
|  4 | bob      |
|  5 | bob      |
|  6 | jane     |
-----------------

Doing the magic with temporary is required since MySQL cannot use a sub-select in delete query that uses the delete's target table.

CREATE TEMPORARY TABLE IF NOT EXISTS users_to_delete (id INTEGER);

INSERT INTO users_to_delete (id)
    SELECT MIN(u1.id) as id
    FROM users u1
    INNER JOIN users u2 ON u1.username = u2.username
    GROUP BY u1.username;

DELETE FROM users WHERE id NOT IN (SELECT id FROM users_to_delete);

I know the query is a bit hairy but it does the work, even if the users table has more than 2 columns.

Emil Ivanov
+1 for the temp table workaround for MySQL's inablility to do it properly :)
ck
+1  A: 

You need to be a bit careful of how the data in your table is used. If this really is a users table, there is likely other tables with FKs pointing to the ID column. In which case you need to update those tables to use ID you have selected to keep.

If it's just a standalone table (no table reference it)

CREATE TEMPORARY TABLE Tmp (ID int);
INSERT INTO Tmp SELECT ID FROM USERS GROUP BY User;
DELETE FROM Users WHERE ID NOT IN (SELECT ID FROM Tmp);

Users table linked from other tables

Create the temporary tables including a link table that holds all the old id's and the respective new ids which other tables should reference instead.

CREATE TEMPORARY TABLE Keep (ID int, User varchar(45));
CREATE TEMPORARY TABLE Remove (OldID int, NewID int);
INSERT INTO Keep SELECT ID, User FROM USERS GROUP BY User;
INSERT INTO Remove SELECT u1.ID, u2.ID FROM Users u1 INNER JOIN Keep u2 ON u2.User = u1.User WHERE u1.ID NOT IN (SELECT ID FROM Users GROUP BY User);

Go through any tables which reference your users table and update their FK column (likely called UserID) to point to the New unique ID which you have selected, like so...

UPDATE MYTABLE t INNER JOIN Remove r ON t.UserID = r.OldID
SET t.UserID = r.NewID;

Finally go back to your users table and remove the no longer referenced duplicates:

DELETE FROM Users WHERE ID NOT IN (SELECT ID FROM Keep);

Clean up those Tmp tables:

DROP TABLE KEEP;
DROP TABLE REMOVE;
badbod99
A: 

Every answer above and/or below didn't work for me, therefore I decided to write my own little script. It's not the best, but it gets the job done.
Comments are included throughout, but this script is customized for my needs, and I hope the idea helps you.

I basically wrote the database contents to a temp file, called the temp file, applied the function to the called file to remove the duplicates, truncated the table, and then input the data right back into the SQL. Sounds like a lot, I know.

If you're confused as to what $setprofile is, it's a session that's created upon logging into my script (to establish a profile), and is cleared upon logging out.


<?php
// session and includes, you know the drill.
session_start();
include_once('connect/config.php');

// create a temp file with session id and current date
$datefile =  date("m-j-Y");
$file = "temp/$setprofile-$datefile.txt";

$f = fopen($file, 'w'); // Open in write mode

// call the user and pass via SQL and write them to $file
$sql = mysql_query("SELECT * FROM _$setprofile ORDER BY user DESC");
while($row = mysql_fetch_array($sql))
{
$user = $row['user'];
$pass = $row['pass'];

$accounts = "$user:$pass "; // the white space right here is important, it defines the separator for the dupe check function
fwrite($f, $accounts);

}
fclose($f);


// **** Dupe Function **** //

// removes duplicate substrings between the seperator
function uniqueStrs($seperator, $str) {
// convert string to an array using ' ' as the seperator
$str_arr = explode($seperator, $str);
// remove duplicate array values
$result = array_unique($str_arr);
// convert array back to string, using ' ' to glue it back
$unique_str = implode(' ', $result);
// return the unique string
return $unique_str;
}

// **** END Dupe Function **** //


// call the list we made earlier, so we can use the function above to remove dupes
$str = file_get_contents($file);
// seperator
$seperator = ' ';
// use the function to save a unique string
$new_str = uniqueStrs($seperator, $str);



// empty the table
mysql_query("TRUNCATE TABLE _$setprofile") or die(mysql_error());

// prep for SQL by replacing test:test with ('test','test'), etc.
// this isn't a sufficient way of converting, as  i said, it works for me.
$patterns = array("/([^\s:]+):([^\s:]+)/", "/\s++\(/");
$replacements = array("('$1', '$2')", ", (");


// insert the values into your table, and presto! no more dupes.
$sql = 'INSERT INTO `_'.$setprofile.'` (`user`, `pass`) VALUES ' . preg_replace($patterns, $replacements, $new_str) . ';';
$product = mysql_query($sql) or die(mysql_error()); // put $new_str here so it will replace new list with SQL formatting

// if all goes well.... OR wrong? :)
if($product){ echo "Completed!";
} else {
echo "Failed!";
}

unlink($file); // delete the temp file/list we made earlier
?>
Homework
+2  A: 

You can solve it with only one query.

If your table has the following structure:

CREATE TABLE  `users` (
  `id` int(10) unsigned NOT NULL auto_increment,
  `username` varchar(45) NOT NULL,
  PRIMARY KEY  (`id`)
) ENGINE=MyISAM AUTO_INCREMENT=8 DEFAULT CHARSET=latin1;

you could do something like that (this will delete all duplicate users based on username with and ID greater than the smaller ID for that username):

DELETE users
  FROM users INNER JOIN
  (SELECT MIN(id) as id, username FROM users GROUP BY username) AS t
  ON users.username = t.username AND users.id > t.id

It works and I've already use something similar to delete duplicates.

Sergi
THIS WORKS. I wrote that useless script for nothing! http://stackoverflow.com/questions/1728048/removing-duplicate-field-entries-in-sql/1728724#1728724
Homework
Well, most people don't know that you could do this type of things with MySQL. In fact, is even possible to delete related records from two or more different tables at the same time.
Sergi
It doesn't handle linked tables with FKs, but seems in this instance, he doesn't have any!
badbod99
@badbod99, I haven't tested it with InnoDB and FKs, but it that case I think is possible to change it to delete also associated rows in foreign table.
Sergi