views:

163

answers:

6

I have rows in an Oracle database table which should be unique for a combination of two fields but the unique constrain is not set up on the table so I need to find all rows which violate the constraint myself using SQL. Unfortunately my meager SQL skills aren't up to the task.

My table has three columns which are relevant: entity_id, station_id, and obs_year. For each row the combination of station_id and obs_year should be unique, and I want to find out if there are rows which violate this by flushing them out with an SQL query.

I have tried the following SQL (suggested by this previous question) but it doesn't work for me (I get ORA-00918 column ambiguously defined):

SELECT
entity_id, station_id, obs_year
FROM
mytable t1
INNER JOIN (
SELECT entity_id, station_id, obs_year FROM mytable 
GROUP BY entity_id, station_id, obs_year HAVING COUNT(*) > 1) dupes 
ON 
t1.station_id = dupes.station_id AND
t1.obs_year = dupes.obs_year

Can someone suggest what I'm doing wrong, and/or how to solve this? Thanks in advance for your help.

--James

+2  A: 

Re-write of your query

SELECT
t1.entity_id, t1.station_id, t1.obs_year
FROM
mytable t1
INNER JOIN (
SELECT entity_id, station_id, obs_year FROM mytable 
GROUP BY entity_id, station_id, obs_year HAVING COUNT(*) > 1) dupes 
ON 
t1.station_id = dupes.station_id AND
t1.obs_year = dupes.obs_year

I think the ambiguous column error (ORA-00918) was because you were selecting columns whose names appeared in both the table and the subquery, but you did not specifiy if you wanted it from dupes or from mytable (aliased as t1).

FrustratedWithFormsDesigner
+1  A: 

Could you not create a new table that includes the unique constraint, and then copy across the data row by row, ignoring failures?

fredley
Yes, this is a good idea, thanks!BTW I'm trying to figure out how to create the constraint on my table using annotations in my entity class (I'm a Java developer using JPA/Hibernate), see http://stackoverflow.com/questions/3504477/how-to-specify-that-a-combination-of-columns-should-be-a-unique-constraint-using
James Adams
+2  A: 

Change the 3 fields in the initial select to be

SELECT
t1.entity_id, t1.station_id, t1.obs_year
Basiclife
Yes, good eye. Thanks!
James Adams
+2  A: 
SELECT  *
FROM    (
        SELECT  t.*, ROW_NUMBER() OVER (PARTITION BY station_id, obs_year ORDER BY entity_id) AS rn
        FROM    mytable t
        )
WHERE   rn > 1
Quassnoi
+1, very clever!
FrustratedWithFormsDesigner
Thanks a lot for this response. Unfortunately when I run this I get an "ORA-00923: FROM keyword not found where expected" message.
James Adams
@James: try now
Quassnoi
Thanks, Quassnoi!
James Adams
+1  A: 
SELECT entity_id, station_id, obs_year
FROM mytable t1
WHERE EXISTS (SELECT 1 from mytable t2 Where
       t1.station_id = t2.station_id
       AND t1.obs_year = t2.obs_year
       AND t1.RowId <> t2.RowId)
Michael Pakhantsov
Thanks, Michael, I like this simple approach.
James Adams
+1  A: 

You need to specify the table for the columns in the main select. Also, assuming entity_id is the unique key for mytable and is irrelevant to finding duplicates, you should not be grouping on it in the dupes subquery.

Try:

SELECT t1.entity_id, t1.station_id, t1.obs_year
FROM mytable t1
INNER JOIN (
SELECT station_id, obs_year FROM mytable 
GROUP BY station_id, obs_year HAVING COUNT(*) > 1) dupes 
ON 
t1.station_id = dupes.station_id AND
t1.obs_year = dupes.obs_year
Mark Bannister
Thanks, Mark, for the tip about not using entity_id in the grouping subquery, and for the illustrative example.
James Adams