views:

201

answers:

6

When I try to create a unique index on a large table, I get a unique contraint error. The unique index in this case is a composite key of 4 columns.

Is there an efficient way to identify the duplicates other than :

select col1, col2, col3, col4, count(*)
from Table1
group by col1, col2, col3, col4
having count(*) > 1

The explain plan above shows full table scan with extremely high cost, and just want to find if there is another way.

Thanks !

+1  A: 

Since there is no index on those columns, that query would have to do a full table scan - no other way to do it really, unless one or more of those columns is already indexed.

You could create the index as a non-unique index, then run the query to identify the duplicate rows (which should be very fast once the index is created). But I doubt if the combined time of creating the non-unique index then running the query would be any less than just running the query without the index.

Eric Petroelje
A: 

I don't think there is a quicker way unfortunately.

Adrian
+6  A: 
Jeffrey Hantin
After you have resolved your non-unique issue, you can enforce the unique constraint using the non-unique index you have created. It won't let you create a unique index while you have a non-unique index on the same columns, so if you REALLY want a unique index, create your non-unique index as create index t_ix on table1(col1,col2,col3,col4,1); With the literal at the end, it won't stop you later creating the unique index on col1,col2,col3,col4 and then dropping the non-unique index
Gary
All replies indicated that there is no easy way out of this problem. But this answer also gave me an approach, so I picked this as the best answer to my problem. Thanks Jeff.
Vj_plugs_in
A: 

Try this :

select  col1, col2, col3, col4 from Table1 t1
where rowid > (select min(rowid) from Table1 t2
where t2.key1 = t1.key1
and t2.key2 = t1.key2
and t2.key3 = t1.key3
and t2.key4 = t1.key4);
Sai Ganesh
@Sai - A different way of doing the same thing, but would almost certainly still result in a full table scan unless at least one of those columns was indexed.
Eric Petroelje
So, replace one full table scan with two full table scans?
Jeffrey Kemp
+1  A: 

In fact, you need to look for a duplicate of every single row in a table. No way to do this effectively without an index.

+2  A: 

You can use the EXCEPTIONS INTO clause to trap the duplicated rows.

If you don't already have an EXCEPTIONS table create one using the provided script:

SQL>  @$ORACLE_HOME/rdbms/admin/ultexcpt.sql

Now you can attempt to create a unique constraint like this

alter table Table1
add  constraint tab1_uq UNIQUE (col1, col2, col3, col4)
exceptions into exceptions
/

This will fail but now your EXCEPTIONS table contains a list of all the rows whose keys contain duplicates, identified by ROWID. That gives you a basis for deciding what to do with the duplicates (delete, renumber, whatever).

edit

As others have noted you have to pay the cost of scanning the table once. This approach gives you a permanent set of the duplicated rows, and ROWID is the fastest way of accessing any given row.

APC