ansaurus

Question

How can I get a COUNT(col) ... GROUP BY to use an index?

Answer 1

A:

you could use a hint http://download.oracle.com/docs/cd/B10501_01/server.920/a96533/hintsref.htm , but remember that using an index might not always result in faster execution.

Shepherdess 2010-04-29 11:39:09

Answer 2

A:

(Just in case, are you sure it's doing a table scan and not an index scan?)

Try using COUNT(*) instead of COUNT(col2) (assuming this is appropriate for you problem, of course). Also, maybe try an index with just col1.

Marcelo Cantos 2010-04-29 11:39:16

The query plan has got TABLE ACCESS FULL in it, with a cost of 2200. I can't create indexes on it, but shouldn't the index on (col1, col2, ...) be the same as an index just on col1 for queries against col1?

thecoop 2010-04-29 12:01:36

Answer 3

+3 A:

I have had the chance to play around with this, and my previous comments regarding the NOT IN are a red herring in this case. The key thing is the presence of NULLs, or rather whether the indexed columns have NOT NULL constraints enforced.

This is going to depend on the version of the database you're using, because the optimizer gets smarter with each release. I'm using 11gR1 and the optimizer used the index in all cases except one: when both columns were null and I didn't include the NOT IN clause:

SQL> desc big_table
 Name                                  Null?    Type
 -----------------------------------  ------    -------------------
 ID                                             NUMBER
 COL1                                           NUMBER
 COL2                                           VARCHAR2(30 CHAR)
 COL3                                           DATE
 COL4                                           NUMBER

Without the NOT IN clause...

SQL> explain plan for
  2      select col4, count(col1) from big_table
  3      group by col4
  4  /

Explained.

SQL> select * from table(dbms_xplan.display)
  2  /

PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------------
Plan hash value: 1753714399

----------------------------------------------------------------------------------------
| Id  | Operation          | Name      | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |           | 31964 |   280K|       |  7574   (2)| 00:01:31 |
|   1 |  HASH GROUP BY     |           | 31964 |   280K|    45M|  7574   (2)| 00:01:31 |
|   2 |   TABLE ACCESS FULL| BIG_TABLE |  2340K|    20M|       |  4284   (1)| 00:00:52 |
----------------------------------------------------------------------------------------

9 rows selected.


SQL>

When I dobbed the NOT IN clause back in, the optimizer opted to use the index. Weird.

SQL> explain plan for
  2      select col4, count(col1) from big_table
  3      where col1 not in (12, 19)
  4      group by col4
  5  /

Explained.

SQL> select * from table(dbms_xplan.display)
  2  /

PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------------
Plan hash value: 343952376

----------------------------------------------------------------------------------------
| Id  | Operation             | Name   | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |        | 31964 |   280K|       |  5057   (3)| 00:01:01 |
|   1 |  HASH GROUP BY        |        | 31964 |   280K|    45M|  5057   (3)| 00:01:01 |
|*  2 |   INDEX FAST FULL SCAN| BIG_I2 |  2340K|    20M|       |  1767   (2)| 00:00:22 |
----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------

   2 - filter("COL1"<>12 AND "COL1"<>19)

14 rows selected.

SQL>

Just to repeat, in all other cases, as long as one of the indexed columns was declared not nill, the index was used to satisfy the query. This may not be true on earlier versions of Oracle, but it probably points the way forward.

APC 2010-04-29 11:44:49

Removing the NOT IN still results in a full table scan of the table,

thecoop 2010-04-29 11:54:18

I've edited the question to clarify it a bit

thecoop 2010-04-29 12:37:45

Answer 4

A:

I'm not sure, but I think that the only way that index can be used is grouping by (col1,col2). If that groupings result in high values, may be doing something like this could work.

select col1, sum(cnt)
from (select col1, count(*) cnt from table group by col1,col2)
group by col1

It's just a wild guess.

Samuel 2010-04-29 12:09:55

Answer 5

A:

You are querying against oracle's fixed tables, since you've not stated which db vesion this is, I'll assume a recent one. Have the fixed tables been analyzed and have updated statistics? Have you tried your query using the rule base optimizer by the use of the /*+ rule */ hint. Often I've seen that queries against oracle's own fixed tables perform better when the rule base optimizer is used.

MichaelN 2010-05-05 22:10:20

ansaurus

tags:

views:

answers:

How can I get a COUNT(col) ... GROUP BY to use an index?

related questions