tags:

views:

206

answers:

4

I have a requirement where i have two records with same value except the PK. how can i delete one of them.I have plenty of such duplicate records.

A: 
delete from thetable where pk_column_name=pk_value;
nos
A: 
DELETE
     T1
FROM
     My_Table T1
INNER JOIN My_Table T2 ON
     T2.duplicate_column = T1.duplicate_column AND
     T2.pk_column > T1.pk_column  -- You could make this "<" if you wanted too

After that, you might want to consider looking at your database design since it seems like you have columns that are supposed to be unique but for which you don't have any unique constraint. Maybe that should be your PK or at the very least there should be a unique constraint on the column(s).

Since DB2 may not support the above, another option would be to use a subquery:

DELETE
FROM
     My_Table
WHERE
     pk_column IN
     (
          SELECT
               T1.pk_column
          FROM
               My_Table T1
          INNER JOIN My_Table T2 ON
               T2.duplicate_column = T1.duplicate_column AND
               T2.pk_column > T1.pk_column
     )
Tom H.
Have you tried this on DB2? AFAIK, DB2 does not support multi-table DELETE syntax. That's a non-standard extension to SQL, supported by e.g. MySQL.
Bill Karwin
A: 

One solution is to write a procedure that opens a cursor to a query ordered by the columns that define duplicates, and use DELETE ... WHERE CURRENT OF CURSOR when the row is a duplicate of the previous row. Here's pseudocode for what I mean:

sql [ctx] C1 = { SELECT * FROM MyTable ORDER BY dup_column };

sql {  FETCH C1 INTO row  };
while ( !C1.endFetch() )  {
    if ( row.dup_column = prevrow.dup_column ) {
        sql [ctx] { DELETE FROM MyTable
                     WHERE CURRENT OF C1  };
    }

    prevrow.dup_column = row.dup_column;

    sql { FETCH C1 INTO ...     };
}
C1.close();
Bill Karwin
A: 

For each unique row, find the smallest id, then delete all remaining rows.

delete 
  from MyTable 
  where Id not in (
    select min(Id)
      from MyTable
      group by column1, column2 /* these are the unique columns */ 
   )

once the data is cleaned, add a unique constraint/index to the unique columns to, as Tom H. suggested.

zsepi