views:

398

answers:

4

I had a query where an index was not used when I thought it could be, so I reproduced it out of curiosity:

Create a test_table with 1.000.000 rows (10 distinct values in col, 500 bytes of data in some_data).

CREATE TABLE test_table AS (
  SELECT MOD(ROWNUM,10) col, LPAD('x', 500, 'x') some_data
  FROM dual
  CONNECT BY ROWNUM <= 1000000
);

Create an index and gather table stats:

CREATE INDEX test_index ON test_table ( col );

EXEC dbms_stats.gather_table_stats( 'MY_SCHEMA', 'TEST_TABLE' );

Try to get distinct values of col and the COUNT:

EXPLAIN PLAN FOR
  SELECT col, COUNT(*)
  FROM test_table
  GROUP BY col;

---------------------------------------------------------------------------------
| Id  | Operation          | Name       | Rows  | Bytes | Cost (%CPU)| Time
---------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |            |    10 |    30 | 15816   (1)| 00:03:10 
|   1 |  HASH GROUP BY     |            |    10 |    30 | 15816   (1)| 00:03:10 
|   2 |   TABLE ACCESS FULL| TEST_TABLE |   994K|  2914K| 15755   (1)| 00:03:10 
--------------------------------------------------------------------------------- 

The index is not used, providing the hint does not change this.

I guess, the index can't be used in this case, but why?

+13  A: 

UPDATE: Try making the col column NOT NULL. That is the reason it's not using the index. When it's not null, here's the plan.

SELECT STATEMENT, GOAL = ALL_ROWS           69  10  30
                    HASH GROUP BY           69  10  30
 INDEX FAST FULL SCAN   SANDBOX TEST_INDEX  56  98072   294216

If the optimizer determines that it's more efficient NOT to use the index (maybe due to rewriting the query), then it won't. Optimizer hints are just that, namely, hints to tell Oracle an index you'd like it to use. You can think of them as suggestions. But if the optimizer determines that it's better not to use the index (again, as result of query rewrite for example), then it's not going to.

Refer to this link: http://download.oracle.com/docs/cd/B19306_01/server.102/b14211/hintsref.htm "Specifying one of these hints causes the optimizer to choose the specified access path only if the access path is available based on the existence of an index or cluster and on the syntactic constructs of the SQL statement. If a hint specifies an unavailable access path, then the optimizer ignores it."

Since you are running a count(*) operation, the optimizer has determined that it's more efficient to just scan the whole table and hash instead of using your index.

Here's another handy link on hints: http://www.dba-oracle.com/t_hint_ignored.htm

dcp
None of the data from the table is being outputted in the query, just `col` and `count(*)`. The entire result could be produced by scanning the index. Doesn't it seem logical that it would be faster to scan the index instead of the table (even if it has to do a full scan either way)? The index is smaller..
Sam
@dcp: I know how hints work, I was curious why the index was not used in first place.
Peter Lang
@Sam: Yes, that's what I thought too...
Peter Lang
@Sam/@Peter Lang - You can see my latest update for the answer.
dcp
I don't think Oracle can determine the count of NULL's from the index, so an index scan cannot be used.
dpbradley
+1: Excellent, thanks for pointing us in the right direction! I'm going to accept `APC`s answer though, as it provides more details about the reasons for this behavior.
Peter Lang
+5  A: 

I ran Peter's original stuff and reproduced his results. I then applied dcp's suggestion...

SQL> alter table test_table modify col not null;

Table altered.

SQL> EXEC dbms_stats.gather_table_stats( user, 'TEST_TABLE' , cascade=>true)

PL/SQL procedure successfully completed.

SQL> EXPLAIN PLAN FOR
  2    SELECT col, COUNT(*)
  3    FROM test_table
  4    GROUP BY col;

Explained.

SQL> select * from table(dbms_xplan.display)
  2  /

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------
Plan hash value: 2099921975

------------------------------------------------------------------------------------
| Id  | Operation             | Name       | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |            |    10 |    30 |   574   (9)| 00:00:07 |
|   1 |  HASH GROUP BY        |            |    10 |    30 |   574   (9)| 00:00:07 |
|   2 |   INDEX FAST FULL SCAN| TEST_INDEX |  1000K|  2929K|   532   (2)| 00:00:07 |
------------------------------------------------------------------------------------

9 rows selected.

SQL>

The reason this matters, is because NULL values are not included in a normal B-TREE index, but the GROUP BY has to include NULL as a grouping "value" in your query. By telling the optimizer that there are no NULLs in col it is free to use the much more efficient index (I was getting an elapsed time of almost 3.55 seconds with the FTS). This is a classic example of how metadata can influence the optimizer.

Incidentally, this is obviously a 10g or 11g database, because it uses the HASH GROUP BY algorithm, instead of the older SORT (GROUP BY) algorithm.

APC
@APC - It's dcp, not Dep :)
dcp
@dcp - sorry, I need to eat more carrots. I have corrected my bloomer.
APC
@APC - No problem. Hey, at least you didn't type "Dope" :).
dcp
Thanks! Accepted this answer as it provides the details about the reasons for this behavior.
Peter Lang
+9  A: 

Hi Peter,

you forgot this really important information: COL is not null

If the column is NULLABLE, the index can not be used because there might be unindexed rows.

SQL> ALTER TABLE test_table MODIFY (col NOT NULL);

Table altered
SQL> EXPLAIN PLAN FOR
  2  SELECT col, COUNT(*) FROM test_table GROUP BY col;

Explained
SQL> SELECT * FROM table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
Plan hash value: 1077170955
--------------------------------------------------------------------------------
| Id  | Operation            | Name       | Rows  | Bytes | Cost (%CPU)| Time
--------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |            |    10 |    30 |  1954   (1)| 00:00:2
|   1 |  SORT GROUP BY NOSORT|            |    10 |    30 |  1954   (1)| 00:00:2
|   2 |   INDEX FULL SCAN    | TEST_INDEX |   976K|  2861K|  1954   (1)| 00:00:2
--------------------------------------------------------------------------------
Vincent Malgrat
A: 

bitmap index will do as well

Execution Plan
----------------------------------------------------------
Plan hash value: 2200191467

---------------------------------------------------------------------------------
| Id  | Operation          | Name       | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |            |    10 |    30 | 15983   (2)| 00:03:12 |
|   1 |  HASH GROUP BY     |            |    10 |    30 | 15983   (2)| 00:03:12 |
|   2 |   TABLE ACCESS FULL| TEST_TABLE |  1013K|  2968K| 15825   (1)| 00:03:10 |
---------------------------------------------------------------------------------

SQL> create bitmap index test_index on test_table(col);

Index created.

SQL> EXEC dbms_stats.gather_table_stats( 'MY_SCHEMA', 'TEST_TABLE' );

PL/SQL procedure successfully completed.

SQL> SELECT col, COUNT(*) 2 FROM test_table 3 GROUP BY col 4 /

Execution Plan
----------------------------------------------------------
Plan hash value: 238193838

---------------------------------------------------------------------------------------
| Id  | Operation                | Name       | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT         |            |    10 |    30 |   286   (0)| 00:00:04 |
|   1 |  SORT GROUP BY NOSORT    |            |    10 |    30 |   286   (0)| 00:00:04 |
|   2 |   BITMAP CONVERSION COUNT|            |  1010K|  2961K|   286   (0)| 00:00:04 |
|   3 |    BITMAP INDEX FULL SCAN| TEST_INDEX |       |       |            |          |
---------------------------------------------------------------------------------------

SQL>

iggy
Bitmap indexes are suitable for only a limited range of tables. If our table is subject to a lot of inserts, updates or deletes then the cost of maintaining the index may be too much.
APC