I have a requirement where i have two records with same value except the PK. how can i delete one of them.I have plenty of such duplicate records.
DELETE
T1
FROM
My_Table T1
INNER JOIN My_Table T2 ON
T2.duplicate_column = T1.duplicate_column AND
T2.pk_column > T1.pk_column -- You could make this "<" if you wanted too
After that, you might want to consider looking at your database design since it seems like you have columns that are supposed to be unique but for which you don't have any unique constraint. Maybe that should be your PK or at the very least there should be a unique constraint on the column(s).
Since DB2 may not support the above, another option would be to use a subquery:
DELETE
FROM
My_Table
WHERE
pk_column IN
(
SELECT
T1.pk_column
FROM
My_Table T1
INNER JOIN My_Table T2 ON
T2.duplicate_column = T1.duplicate_column AND
T2.pk_column > T1.pk_column
)
One solution is to write a procedure that opens a cursor to a query ordered by the columns that define duplicates, and use DELETE ... WHERE CURRENT OF CURSOR
when the row is a duplicate of the previous row. Here's pseudocode for what I mean:
sql [ctx] C1 = { SELECT * FROM MyTable ORDER BY dup_column };
sql { FETCH C1 INTO row };
while ( !C1.endFetch() ) {
if ( row.dup_column = prevrow.dup_column ) {
sql [ctx] { DELETE FROM MyTable
WHERE CURRENT OF C1 };
}
prevrow.dup_column = row.dup_column;
sql { FETCH C1 INTO ... };
}
C1.close();
For each unique row, find the smallest id, then delete all remaining rows.
delete
from MyTable
where Id not in (
select min(Id)
from MyTable
group by column1, column2 /* these are the unique columns */
)
once the data is cleaned, add a unique constraint/index to the unique columns to, as Tom H. suggested.