ansaurus

Question

Check that the values in all the columns are different in sql - Oracle

Answer 1

+3 A:

if this is a data constraint - then i would put it in a trigger - and compare each value in PLSQL. (i think this is recommended based on your description)

if it is a query, then you will have a pretty large where clause checking each column against the others.

edit:

something like:

select * from mytable where
( col1=col2 or col1=col3 or col1=col4 or col1=col5 ... )
or
( col2=col3 or col2=col4 or col2=col5 ... )
or
( col3=col4 or col3=col5 ... )
etc...

Randy 2010-07-08 14:36:10

Can this not be done with a sql query where i choose all the rows which have columns with similar data.

Arkid 2010-07-08 14:38:13

Unless I'm misunderstanding, it's a pretty large check in the trigger or a pretty large WHERE clause in a query... I'm with Andy on this one: factor the columns out into another table.

Justin K 2010-07-08 14:41:42

The long query will work for sure. Is there no smaller and optimized query to do the same.

Arkid 2010-07-08 15:06:57

It is not a transactional data. it is analytics based data where columns cannot be factored out.

Arkid 2010-07-08 15:10:11

I'm inclined to think it cannot. This solution looks horrific, but all the tests have to be made, so the query must make them.

Brian Hooper 2010-07-08 16:28:18

Answer 2

+4 A:

It sounds to me that those 14 columns are denormalized, and should actually be a subtable with a unique index constraint on them.

Andy Lester 2010-07-08 14:37:50

Actually it is a pretty big dataset.

Arkid 2010-07-08 14:40:18

good comment, i assumed there was a data cleansing effort required in my answer. A proper normalized structure is always best. :)

Randy 2010-07-08 14:43:47

A composite unique index would only catch duplicates of the same set of column values. IE: 14 of 14 have to match, 13 of 14 will not cause a unique constraint error.

OMG Ponies 2010-07-08 15:47:57

Something like this could be done...one could create a table (RowNumber, ColumnNumber, Value), and then create a unique constraint on (RowNumber, Value).

Brian Hooper 2010-07-08 16:26:57

Answer 3

+3 A:

Here is another way to do it in SQL. This example is for three columns. You simply add more union all select ... clauses to add more columns to the check. It will return the ROWID of any row for which the total number of columns is greater than the number of distinct values.

I'm assuming that none of the columns can contain NULLs. If that is an issue this might give you false positives.

select rowid,count(*),count(distinct col)
from (
select rowid,col1 col from the_table
  union all
select rowid,col2 col from the_table
  union all
select rowid,col3 col from the_table
)
group by rowid
having count(distinct col) < count(*)

Dave Costa 2010-07-08 15:48:03

+1 for getting there nearl 30 minutes faster than it took me to prove the method

Mark Baker 2010-07-08 16:23:30

Answer 4

A:

A very dirty, kludgey method.... but it should work with 9i

SELECT *
  FROM ( SELECT keyIdentifierField,
                COUNT(testField) AS fieldCount,
                COUNT(DISTINCT testField) AS distinctFieldCount
           FROM ( SELECT keyIdentifierField,
                         col1 AS testField
                    FROM myTable
                  UNION ALL
                  SELECT keyIdentifierField,
                         col2 AS testField
                    FROM myTable
                  UNION ALL
                  SELECT keyIdentifierField,
                         col3 AS testField
                    FROM myTable
                  ...
                  UNION ALL
                  SELECT keyIdentifierField,
                         col14 AS testField
                    FROM myTable
                )
          GROUP BY keyIdentifierField
       )
 WHERE fieldCount <> distinctFieldCount

Mark Baker 2010-07-08 16:14:27

Looks like Dave Costa posted a very similar solution while I was working this out.... but I can't be bothered to delete my reply after typing it all in

Mark Baker 2010-07-08 16:16:30

Answer 5

A:

If you only have ONE possible duplicated number, there are possible solutions using bit operators. If you're using Oracle 11, you could UNPIVOT the column data into rows,

Presuming your table has a unique key called UniqueKey

SELECT  UniqueKey,cvalue
FROM
(SELECT *
FROM yourTable
   UNPIVOT INCLUDE NULLS (cvalue FOR check_values IN (col1, col2, col3, col4))
)
GROUP BY UniqueKey,cvalue HAVING count(cvalue) > 1

This may be syntactically a bit off as I do not have an Ora 11 test db to hand.

Prior to Oracle 11 the only way to do something like this would be using Object types and pl/sql (function that given an Id, returns the list of column values on that row as a Collection, then join that function to the Id).

It's probably easier to just do it in pl/sql (create an array indexed by Integer, for each column check .exists(value) before setting - if it exists, you have your duplicate). Call function once per column per row, and reset array between rows.

Alternatively - if this is a one off integrity check - create a temp table on key / value pairs, unique key on the pair, then insert each value in each row against the key, and let Oracle throw out your errors.

JulesLt 2010-07-08 16:51:51

Answer 6

+4 A:

Check the following examples. Firstly I group the values into a collect. I've used the built in SYS.DBMS_DEBUG_VC2COLL but it is better if you create your won. Then I take that collection then do a MULTISET UNION DISTINCT with an empty collection of the same type. That will remove duplicated entries.

select * from
 (select sys.dbms_debug_vc2coll(1,2,3) a, 
         sys.dbms_debug_vc2coll(1,2,3) multiset union distinct sys.dbms_debug_vc2coll() b
 from dual)
where a=b;

select * from
 (select sys.dbms_debug_vc2coll(1,2,3,1) a, 
         sys.dbms_debug_vc2coll(1,2,3,1) multiset union distinct sys.dbms_debug_vc2coll() b
 from dual)
where a=b

Finally I compare that DISTINCT'd collection with the original collection. If they match, then the collection already only had unique values.

Gary 2010-07-08 23:16:18

+1 very nice. .

Jeffrey Kemp 2010-07-09 00:36:49

IIRC support for MULTISET UNION was a new feature in 10g and the OP specified 9i. But +1 for the slickest solution by far.

APC 2010-07-09 06:11:13

Answer 7

A:

Another approach that will work in 9i. I make no guarantees for performance, only that this is a pure SQL approach that works on 9i – but running for 10,000 rows with 5 columns was sub-second, so it’s reasonable.

create table test (
   uniqueKey number, 
   c1 number,
   c2 number,
   c3 number,
   c4 number,
   c5 number)

Build some test cases – every 4th row is bad – we use a pseudo-table of integers

    insert into test 
    select  r,1,2,3,4,CASE WHEN MOD(r,4)=0 THEN 4 ELSE 5 END
    FROM (SELECT rownum r from dual connect by rownum <= 10000);

Now convert the column values back into rows by using the same technique of a pseudo-table over the sequence of integers

SELECT uniqueKey,r from
(
    SELECT rownum r from dual connect by rownum <= 100 
/* 100 is the max value in any of our columns */
) numbers,
  test
WHERE r in (c1,c2,c3,c4,c5)

This gives the set of unique numbers against each row.

To find invalid rows, then just check if the count is less than the number of columns

SELECT uniqueKey  from
(
    SELECT rownum r from dual connect by rownum <= 100 /* Our max potential value */
) numbers,
  test
WHERE r in (c1,c2,c3,c4,c5)
GROUP BY uniqueKey
HAVING COUNT(r) < 5

Unfortunately, this doesn't tell you which values are duplicated, but it does identify your problem rows.

JulesLt 2010-07-09 11:10:59

ansaurus

tags:

views:

answers:

Check that the values in all the columns are different in sql - Oracle

related questions