views:

142

answers:

5

I've got a table (col1, col2, ...) with an index on (col1, col2, ...). The table has got millions of rows in it, and I want to run a query:

 SELECT col1, COUNT(col2) WHERE col1 NOT IN (<couple of exclusions>) GROUP BY col1

Unfortunately, this is resulting in a full table scan of the table, which takes upwards of a minute. Is there any way of getting oracle to use the index on the columns to return the results much faster?

EDIT:

more specifically, I'm running the following query:

SELECT owner, COUNT(object_name) FROM all_objects GROUP BY owner

and there is an index on SYS.OBJ$ (SYS.I_OBJ2) which indexes the owner# and name columns; I believe I should be able to use this index in the query, rather than a full table scan of SYS.OBJ$

A: 

you could use a hint http://download.oracle.com/docs/cd/B10501_01/server.920/a96533/hintsref.htm , but remember that using an index might not always result in faster execution.

Shepherdess
A: 

(Just in case, are you sure it's doing a table scan and not an index scan?)

Try using COUNT(*) instead of COUNT(col2) (assuming this is appropriate for you problem, of course). Also, maybe try an index with just col1.

Marcelo Cantos
The query plan has got TABLE ACCESS FULL in it, with a cost of 2200. I can't create indexes on it, but shouldn't the index on (col1, col2, ...) be the same as an index just on col1 for queries against col1?
thecoop
+3  A: 

I have had the chance to play around with this, and my previous comments regarding the NOT IN are a red herring in this case. The key thing is the presence of NULLs, or rather whether the indexed columns have NOT NULL constraints enforced.

This is going to depend on the version of the database you're using, because the optimizer gets smarter with each release. I'm using 11gR1 and the optimizer used the index in all cases except one: when both columns were null and I didn't include the NOT IN clause:

SQL> desc big_table
 Name                                  Null?    Type
 -----------------------------------  ------    -------------------
 ID                                             NUMBER
 COL1                                           NUMBER
 COL2                                           VARCHAR2(30 CHAR)
 COL3                                           DATE
 COL4                                           NUMBER

Without the NOT IN clause...

SQL> explain plan for
  2      select col4, count(col1) from big_table
  3      group by col4
  4  /

Explained.

SQL> select * from table(dbms_xplan.display)
  2  /

PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------------
Plan hash value: 1753714399

----------------------------------------------------------------------------------------
| Id  | Operation          | Name      | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |           | 31964 |   280K|       |  7574   (2)| 00:01:31 |
|   1 |  HASH GROUP BY     |           | 31964 |   280K|    45M|  7574   (2)| 00:01:31 |
|   2 |   TABLE ACCESS FULL| BIG_TABLE |  2340K|    20M|       |  4284   (1)| 00:00:52 |
----------------------------------------------------------------------------------------

9 rows selected.


SQL>

When I dobbed the NOT IN clause back in, the optimizer opted to use the index. Weird.

SQL> explain plan for
  2      select col4, count(col1) from big_table
  3      where col1 not in (12, 19)
  4      group by col4
  5  /

Explained.

SQL> select * from table(dbms_xplan.display)
  2  /

PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------------
Plan hash value: 343952376

----------------------------------------------------------------------------------------
| Id  | Operation             | Name   | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |        | 31964 |   280K|       |  5057   (3)| 00:01:01 |
|   1 |  HASH GROUP BY        |        | 31964 |   280K|    45M|  5057   (3)| 00:01:01 |
|*  2 |   INDEX FAST FULL SCAN| BIG_I2 |  2340K|    20M|       |  1767   (2)| 00:00:22 |
----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------

   2 - filter("COL1"<>12 AND "COL1"<>19)

14 rows selected.

SQL>

Just to repeat, in all other cases, as long as one of the indexed columns was declared not nill, the index was used to satisfy the query. This may not be true on earlier versions of Oracle, but it probably points the way forward.

APC
Removing the NOT IN still results in a full table scan of the table,
thecoop
I've edited the question to clarify it a bit
thecoop
A: 

I'm not sure, but I think that the only way that index can be used is grouping by (col1,col2). If that groupings result in high values, may be doing something like this could work.

select col1, sum(cnt)
from (select col1, count(*) cnt from table group by col1,col2)
group by col1

It's just a wild guess.

Samuel
A: 

You are querying against oracle's fixed tables, since you've not stated which db vesion this is, I'll assume a recent one. Have the fixed tables been analyzed and have updated statistics? Have you tried your query using the rule base optimizer by the use of the /*+ rule */ hint. Often I've seen that queries against oracle's own fixed tables perform better when the rule base optimizer is used.

MichaelN