I am using postgres. I want to delete Duplicate rows. The condition is that , 1 copy from the set of duplicate rows would not be deleted.
i.e : if there are 5 duplicate records then 4 of them will be deleted.
I am using postgres. I want to delete Duplicate rows. The condition is that , 1 copy from the set of duplicate rows would not be deleted.
i.e : if there are 5 duplicate records then 4 of them will be deleted.
Try the steps described in this article: Removing duplicates from a PostgreSQL database.
It describes a situation when you have to deal with huge amount of data which isn't possible to group by
.
A simple solution would be this:
DELETE FROM foo
WHERE id NOT IN (SELECT min(id)
FROM foo
GROUP BY hash HAVING count(*) >= 1)
Where hash
is something that gets duplicated.
delete from table
where not id in
(select max(id) from table group by [duplicate row])
This is random (max Value) choice which row you need to keep. If you have aggre whit this please provide more details
The fastest is is join to the same table. http://www.postgresql.org/docs/8.1/interactive/sql-delete.html
CREATE TABLE test(id INT,id2 INT);
CREATE TABLE
mapy=# INSERT INTO test VALUES(1,2);
INSERT 0 1
mapy=# INSERT INTO test VALUES(1,3);
INSERT 0 1
mapy=# INSERT INTO test VALUES(1,4);
INSERT 0 1
DELETE FROM test t1 USING test t2 WHERE t1.id=t2.id AND t1.id2<t2.id2;
DELETE 2
mapy=# SELECT * FROM test;
id | id2
----+-----
1 | 4
(1 row)