ansaurus

Question

Filter SQL query by a unique set of column values, regardless of their order

Answer 1

+1 A:

select distinct
case when PERSON_1>=PERSON_2 then PERSON_1 ELSE PERSON_2 END person_a,
case when PERSON_1>=PERSON_2 then PERSON_2 ELSE PERSON_1 END person_b
FROM RELATIONSHIPS;

tekBlues 2009-06-10 19:32:39

Answer 2

A:

I think something like this should do the trick:

select * from RELATIONSHIPS group by PERSON_1, PERSON_2

Aistina 2009-06-10 19:33:14

Answer 3

+4 A:

Is the relationship always there in both directions? i.e. if John and Jill are related, then is there always a {John,Jill} and {Jill,John} ? If so, just limit to those where Person_1 < Person_2 and take the distinct set.

Marc Gravell 2009-06-10 19:34:04

I am impressed of the stupidity of my own solution.

tekBlues 2009-06-10 19:43:34

The relationship does not always exist in both directions. In fact, it rarely exists in both directions. However, it is possible so I need to be able to filter out the duplicates. Unfortunately, the database is already in production so using this approach won't work because some records exist such that Person_2 < Person_1 (and vice versa). I guess I could create a new rule for the table, then update the existing records, but we're talking about 25M+ records on a production server. Not sure I want to do that. :)

Kevin Babcock 2009-06-11 04:12:41

@tekBlues: No such thing as a "stupid" solution. All ideas are welcome. In fact, many of these ideas are very helpful, even if they don't all solve my specific question.

Kevin Babcock 2009-06-11 04:13:46

Answer 4

+2 A:

You should create a constraint on your Relationships table so that the numeric person_1 value must be less than the numeric person_2 value.

create table RELATIONSHIPS (
    PERSON_1 number not null,
    PERSON_2 number not null,
    RELATIONSHIP  number not null,
    constraint PK_RELATIONSHIPS
        primary key (PERSON_1, PERSON_2),
    constraint UNIQ_RELATIONSHIPS
        CHECK (PERSON_1 < PERSON_2)
);

That way you can be sure that (2,1) can never be inserted -- it would have to be (1,2). Then your PRIMARY KEY constraint will prevent duplicates.

PS: I see Marc Gravell has answered more quickly than I have, with a similar solution.

Bill Karwin 2009-06-10 19:37:05

I considered that - but the problem then is that it complicates the "relationship" value - i.e. depending on the *numbers* you'd have to have either "father" or "son" - very hard to manage. It is also hard to do queries in an expected direction, as you'd need to try both... (with inverted relationship). It seems simpler to keep the relationship in both directions; disk space is cheap, and a simpler query on more data will generally out-perform a complex query on a bit less data.

Marc Gravell 2009-06-10 19:50:27

Yes, as you say, if you need to search for a relationship of a given type, or if relationships aren't always reciprocal, then my solution above doesn't solve the problem.

Bill Karwin 2009-06-10 20:17:43

Answer 5

A:

I think KM almost got it right, I added concat.

SELECT DISTINCT *
    FROM (SELECT DISTINCT concat(Person_1,Person_2) FROM RELATIONSHIPS
          UNION 
          SELECT DISTINCT concat(Person_2, Person_1) FROM RELATIONSHIPS
         ) dt

MikeNereson 2009-06-10 19:41:07

Answer 6

A:

it's kludgy as heck, but it'd at least tell you what unique combinations you have, just not in a real handy way...

select distinct(case when person_1 <= person_2 then person_1||'|'||person_2 else person_2||'|'||person_1 end)
from relationships;

copaX 2009-06-10 19:42:17

looks like tekBlues nailed it - http://stackoverflow.com/questions/977648/filter-sql-query-by-a-unique-set-of-column-values-regardless-of-their-order/977695#977695

copaX 2009-06-10 19:44:26

Answer 7

+3 A:

Untested:

select least(person_1,person_2)
     , greatest(person_1,person_2)
  from relationships
 group by least(person_1,person_2)
     , greatest(person_1,person_2)

To prevent such double entries, you can add a unique index, using the same idea (tested!):

SQL> create table relationships
  2  ( person_1 number not null
  3  , person_2 number not null
  4  , relationship number not null
  5  , constraint pk_relationships primary key (person_1, person_2)
  6  )
  7  /

Table created.

SQL> create unique index ui_relationships on relationships(least(person_1,person_2),greatest(person_1,person_2))
  2  /

Index created.

SQL> insert into relationships values (1,2,0)
  2  /

1 row created.

SQL> insert into relationships values (1,3,0)
  2  /

1 row created.

SQL> insert into relationships values (2,1,0)
  2  /
insert into relationships values (2,1,0)
*
ERROR at line 1:
ORA-00001: unique constraint (RWIJK.UI_RELATIONSHIPS) violated

Regards, Rob.

Rob van Wijk 2009-06-10 19:42:24

Answer 8

+1 A:

There's some uncertainty as to whether you want to prevent duplicates from being inserted into the database. You might just want to fetch unique pairs, while preserving the duplicates.

So here's an alternative solution for the latter case, querying unique pairs even if duplicates exist:

SELECT r1.*
FROM Relationships r1
LEFT OUTER JOIN Relationships r2
  ON (r1.person_1 = r2.person_2 AND r1.person_2 = r2.person_1)
WHERE r1.person_1 < r1.person_2
  OR  r2.person_1 IS NULL;

So if there is a matching row with the id's reversed, there's a rule for which one the query should prefer (the one with id's in numerical order).

If there is no matching row, then r2 will be NULL (this is the way outer join works), so just use whatever is found in r1 in that case.

No need to use GROUP BY or DISTINCT, because there can only be zero or one matching rows.

Trying this in MySQL, I get the following optimization plan:

+----+-------------+-------+--------+---------------+---------+---------+-----------------------------------+------+--------------------------+
| id | select_type | table | type   | possible_keys | key     | key_len | ref                               | rows | Extra                    |
+----+-------------+-------+--------+---------------+---------+---------+-----------------------------------+------+--------------------------+
|  1 | SIMPLE      | r1    | ALL    | NULL          | NULL    | NULL    | NULL                              |    2 |                          | 
|  1 | SIMPLE      | r2    | eq_ref | PRIMARY       | PRIMARY | 8       | test.r1.person_2,test.r1.person_1 |    1 | Using where; Using index | 
+----+-------------+-------+--------+---------------+---------+---------+-----------------------------------+------+--------------------------+

This seems to be a reasonably good use of indexes.

Bill Karwin 2009-06-10 20:30:36

You are correct, I don't want to prevent duplicates in the table, but rather want to fetch unique pairs. Thanks for your suggestion. Unfortunately this syntax doesn't seem to work in Oracle. :(

Kevin Babcock 2009-06-11 04:07:01

I've edited the ON clause in my query example. Does that work any better?

Bill Karwin 2009-06-11 04:17:47

Answer 9

A:

Possibly the simplest solution (that does not require alteration of data structure or creation of triggers) is to create a set of results without the duplicate entries, and add one of the duplicate entries to that set.

would look something like:

 select * from relationships where rowid not in 
    (select a.rowid from  relationships a,relationships b 
       where a.person_1=b.person_2 and a.person_2=b.person_1)
union all
 select * from relationships where rowid in 
    (select a.rowid from  relationships a,relationships b where 
       a.person_1=b.person_2 and a.person_2=b.person_1 and a.person_1>a.person_2)

But usually I never create a table without a one-column primary key.

2009-06-11 12:00:12

Answer 10

A:

You could just,

with rel as (
select *,
       row_number() over (partition by least(person_1,person_2), 
                                       greatest(person_1,person_2)) as rn
  from relationships
       )
select *
  from rel
 where rn = 1;

Scott Swank 2009-06-12 00:01:11

ansaurus

tags:

views:

answers:

Filter SQL query by a unique set of column values, regardless of their order

related questions