I have had the chance to play around with this, and my previous comments regarding the NOT IN are a red herring in this case. The key thing is the presence of NULLs, or rather whether the indexed columns have NOT NULL constraints enforced.
This is going to depend on the version of the database you're using, because the optimizer gets smarter with each release. I'm using 11gR1 and the optimizer used the index in all cases except one: when both columns were null and I didn't include the NOT IN
clause:
SQL> desc big_table
Name Null? Type
----------------------------------- ------ -------------------
ID NUMBER
COL1 NUMBER
COL2 VARCHAR2(30 CHAR)
COL3 DATE
COL4 NUMBER
Without the NOT IN clause...
SQL> explain plan for
2 select col4, count(col1) from big_table
3 group by col4
4 /
Explained.
SQL> select * from table(dbms_xplan.display)
2 /
PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------------
Plan hash value: 1753714399
----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 31964 | 280K| | 7574 (2)| 00:01:31 |
| 1 | HASH GROUP BY | | 31964 | 280K| 45M| 7574 (2)| 00:01:31 |
| 2 | TABLE ACCESS FULL| BIG_TABLE | 2340K| 20M| | 4284 (1)| 00:00:52 |
----------------------------------------------------------------------------------------
9 rows selected.
SQL>
When I dobbed the NOT IN
clause back in, the optimizer opted to use the index. Weird.
SQL> explain plan for
2 select col4, count(col1) from big_table
3 where col1 not in (12, 19)
4 group by col4
5 /
Explained.
SQL> select * from table(dbms_xplan.display)
2 /
PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------------
Plan hash value: 343952376
----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 31964 | 280K| | 5057 (3)| 00:01:01 |
| 1 | HASH GROUP BY | | 31964 | 280K| 45M| 5057 (3)| 00:01:01 |
|* 2 | INDEX FAST FULL SCAN| BIG_I2 | 2340K| 20M| | 1767 (2)| 00:00:22 |
----------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------
2 - filter("COL1"<>12 AND "COL1"<>19)
14 rows selected.
SQL>
Just to repeat, in all other cases, as long as one of the indexed columns was declared not nill, the index was used to satisfy the query. This may not be true on earlier versions of Oracle, but it probably points the way forward.