ansaurus

Question

is there a tricky way to optimize this query

Answer 1

+3 A:

select * from my_big_table where MONTH(exam_date) = 5

oops.. Oracle huh?..

select * from my_big_table where EXTRACT(MONTH from exam_date) = 5

Fosco 2010-07-14 15:43:26

+1 Oracle has a couple of efficient date handling methods which you should use in WHERE because they can make use of the index in weird ways.

Aaron Digulla 2010-07-14 15:47:48

@learn_plsql how much faster is this? Oracle still has to do a full table scan. Have you tried a Function Based Index? It may or may not help because the data has at most 12 distinct values.

2010-07-14 21:18:51

Answer 2

+5 A:

I don't know Oracle, but what about doing

WHERE exam_date BETWEEN first_of_month AND last_of_month

where the two dates are constant expressions.

harpo 2010-07-14 15:44:14

doing: `exam_date between To_date('11/01/2010','MM/DD/YYYY') and to_date('11/30/2010','MM/DD/YYYY')` actually takes longer...

learn_plsql 2010-07-14 15:55:09

Well, it was worth a shot :)

harpo 2010-07-14 20:39:39

Answer 3

+1 A:

The function call means that Oracle won't be able to use any index that might be defined on the column.

Either remove the function call (as in harpo's answer) or use a function based index.

cagcowboy 2010-07-14 15:48:57

Answer 4

+3 A:

Bear in mind that since you want approximately 1/12th of all the data, it may well be more efficient for Oracle to perform a full table scan anyway. This may explain why performance was worse when you followed harpo's advice.

Why? Suppose your data is such that 20 rows fit on each database block (on average), so that you have a total of 3,000,000/20 = 150,000 blocks. That means a full table scan will require 150,000 block reads. Now about 1/12th of the 3,000,000 rows will be for month 05. 3,000,000/12 is 250,000. So that's 250,000 table reads if you use the index - and that's ignoring the index reads that will also be required. So in this example the full table scan does a lot less work than the indexed search.

Tony Andrews 2010-07-14 16:23:47

Hmmm, while I was typing up my answer you posted one which also starts "Bear in mind", making exactly the same point. Great minds and all that ;)

APC 2010-07-14 20:20:07

Indeed - almost spooky!

Tony Andrews 2010-07-15 08:41:47

Answer 5

+3 A:

Bear in miond that there are only twelve distinct values for MONTH. So unless you have a strongly clustered set of records (say if you use partitioining) it is possible that using an index is not necessarily the most efficient way of querying in this fashion.

I didn't find that using EXTRACT() lead the optimizer to use a regular index on my date column but YMMV:

SQL> create index big_d_idx on big_table(col3) compute statistics
  2  /

Index created.

SQL> set autotrace traceonly explain

SQL> select * from big_table
  2  where extract(MONTH from col3) = 'MAY'
  3  /

Execution Plan
----------------------------------------------------------
Plan hash value: 3993303771

-------------------------------------------------------------------------------
| Id  | Operation         | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |           | 23403 |  1028K|  4351   (3)| 00:00:53 |
|*  1 |  TABLE ACCESS FULL| BIG_TABLE | 23403 |  1028K|  4351   (3)| 00:00:53 |
-------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(EXTRACT(MONTH FROM INTERNAL_FUNCTION("COL3"))=TO_NUMBER('M
              AY'))

SQL>

What definitely can persuade the optimizer to use an index in these scenarios is building a function-based index:

SQL> create index big_mon_fbidx on big_table(extract(month from col3))
  2  /

Index created.

SQL> select * from big_table
  2  where extract(MONTH from col3) = 'MAY'
  3  /

Execution Plan
----------------------------------------------------------
Plan hash value: 225326446

-------------------------------------------------------------------------------------------
| Id  | Operation                   | Name          | Rows  | Bytes | Cost (%CPU)|Time    |
-------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |               | 23403 |  1028K|   475   (0)|00:00:06|
|   1 |  TABLE ACCESS BY INDEX ROWID| BIG_TABLE     | 23403 |  1028K|   475   (0)|00:00:06|
|*  2 |   INDEX RANGE SCAN          | BIG_MON_FBIDX |  9361 |       |   382   (0)|00:00:05|
-------------------------------------------------------------------------------------------


Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access(EXTRACT(MONTH FROM INTERNAL_FUNCTION("COL3"))=TO_NUMBER('MAY'))

SQL>

APC 2010-07-14 16:27:10

ansaurus

tags:

views:

answers:

is there a tricky way to optimize this query

related questions