views:

141

answers:

4

I had am maintaining an query that is as follows:

select field_1, field_2
from source_table
minus
select field_1, field_2
from source_table
where status_code in (3, 600);

When I looked at this query, I immediately thought, "That's lame. Why not just use a 'NOT IN' and remove the MINUS business. So I re-wrote it as so:

select field_1, field_2
from source_table
where status_code not in (3, 600);

Just to double-check my sanity, I got counts of each query. To my surprise, the first query returned 789,089 records, and the second query returned 1,518,450 records!

I've looked at this from several angles but can't figure out how these two queries are different. Can anyone explain what is going on, or why I am an idiot this morning?

+8  A: 

These queries are indeed different. field_1 and field2 do not equate to status_code 3 and 600. field_1 could be 'A' and field_2 could be 'B', so you would be eliminating records from the first SELECT that look like A, B. The original is probably the best way of achieving the correct result.

Edit: To give you a better idea of what's going on, you could get the same result, in a similar fashion to how you thought to write the query, by doing a subquery:

select distinct field_1, field_2
from source_table
where (field_1, field_2) not in (
    select field_1, field_2
    from source_table
    where status_code in (3, 600)
);
Alison R.
Ah, just had a light-bulb moment. Thanks for the help!
j0rd4n
That still isn't identical. MINUS is a set command and will implicitly do a distinct. You'd need to add that to the replacement.
Gary
@Gary Thanks. I have changed the SQL accordingly.
Alison R.
+3  A: 

If you do not have a unique constraint on field_1, field_2 or both, Alison is probably right. Consider you have a table: A B 3 A B 10

The first query would eliminate both rows, the second only one. Alternatively, if you have NULL values in status_code column, you may get different results, (A or not A) does not work in SQL if you have NULL values in columns.

ondra
+4  A: 

If there isn't a unique constraint on the combination of field_1 and field_2, the second query may contain duplicates while the first won't, as 'minus' will suppress them. Try the second query with a 'distinct' and see if the counts match.

Alex Poole
+4  A: 

The UNION, MINUS and INTERSECT operators return only unique values. If you have two rows with identical field_1 and field_2, the first query will count it once, while the second will count it twice:

SQL> insert into source_table values ('a', 'b', 10);

SQL> insert into source_table values ('a', 'b', 10);

SQL> select field_1, field_2
  2  from source_table
  3  minus
  4  select field_1, field_2
  5  from source_table
  6  where status_code in (3, 600);

FIELD_1    FIELD_2
---------- ----------
a          b

SQL> select field_1, field_2
  2  from source_table
  3  where status_code not in (3, 600);

FIELD_1    FIELD_2
---------- ----------
a          b
a          b
Marcel Wolf
Ah, that detail makes a big difference. I had forgotten that. Thanks!
j0rd4n