views:

96

answers:

5

I'm working on a table that has 3008698 rows

exam_date is a DATE field.

But queries I run want to match only the month part. So what I do is:

select * from my_big_table where to_number(to_char(exam_date, 'MM')) = 5;

which I believe takes long because of function on the column. Is there a way to avoid this and make it faster? other than making changes to the table? exam_date in the table have different date values. like 01-OCT-10 or 12-OCT-10...and so on

+3  A: 
select * from my_big_table where MONTH(exam_date) = 5

oops.. Oracle huh?..

select * from my_big_table where EXTRACT(MONTH from exam_date) = 5
Fosco
+1 Oracle has a couple of efficient date handling methods which you should use in WHERE because they can make use of the index in weird ways.
Aaron Digulla
@learn_plsql how much faster is this? Oracle still has to do a full table scan. Have you tried a Function Based Index? It may or may not help because the data has at most 12 distinct values.
+5  A: 

I don't know Oracle, but what about doing

WHERE exam_date BETWEEN first_of_month AND last_of_month

where the two dates are constant expressions.

harpo
doing: `exam_date between To_date('11/01/2010','MM/DD/YYYY') and to_date('11/30/2010','MM/DD/YYYY')` actually takes longer...
learn_plsql
Well, it was worth a shot :)
harpo
+1  A: 

The function call means that Oracle won't be able to use any index that might be defined on the column.

Either remove the function call (as in harpo's answer) or use a function based index.

cagcowboy
+3  A: 

Bear in mind that since you want approximately 1/12th of all the data, it may well be more efficient for Oracle to perform a full table scan anyway. This may explain why performance was worse when you followed harpo's advice.

Why? Suppose your data is such that 20 rows fit on each database block (on average), so that you have a total of 3,000,000/20 = 150,000 blocks. That means a full table scan will require 150,000 block reads. Now about 1/12th of the 3,000,000 rows will be for month 05. 3,000,000/12 is 250,000. So that's 250,000 table reads if you use the index - and that's ignoring the index reads that will also be required. So in this example the full table scan does a lot less work than the indexed search.

Tony Andrews
Hmmm, while I was typing up my answer you posted one which also starts "Bear in mind", making exactly the same point. Great minds and all that ;)
APC
Indeed - almost spooky!
Tony Andrews
+3  A: 

Bear in miond that there are only twelve distinct values for MONTH. So unless you have a strongly clustered set of records (say if you use partitioining) it is possible that using an index is not necessarily the most efficient way of querying in this fashion.

I didn't find that using EXTRACT() lead the optimizer to use a regular index on my date column but YMMV:

SQL> create index big_d_idx on big_table(col3) compute statistics
  2  /

Index created.

SQL> set autotrace traceonly explain

SQL> select * from big_table
  2  where extract(MONTH from col3) = 'MAY'
  3  /

Execution Plan
----------------------------------------------------------
Plan hash value: 3993303771

-------------------------------------------------------------------------------
| Id  | Operation         | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |           | 23403 |  1028K|  4351   (3)| 00:00:53 |
|*  1 |  TABLE ACCESS FULL| BIG_TABLE | 23403 |  1028K|  4351   (3)| 00:00:53 |
-------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(EXTRACT(MONTH FROM INTERNAL_FUNCTION("COL3"))=TO_NUMBER('M
              AY'))

SQL>

What definitely can persuade the optimizer to use an index in these scenarios is building a function-based index:

SQL> create index big_mon_fbidx on big_table(extract(month from col3))
  2  /

Index created.

SQL> select * from big_table
  2  where extract(MONTH from col3) = 'MAY'
  3  /

Execution Plan
----------------------------------------------------------
Plan hash value: 225326446

-------------------------------------------------------------------------------------------
| Id  | Operation                   | Name          | Rows  | Bytes | Cost (%CPU)|Time    |
-------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |               | 23403 |  1028K|   475   (0)|00:00:06|
|   1 |  TABLE ACCESS BY INDEX ROWID| BIG_TABLE     | 23403 |  1028K|   475   (0)|00:00:06|
|*  2 |   INDEX RANGE SCAN          | BIG_MON_FBIDX |  9361 |       |   382   (0)|00:00:05|
-------------------------------------------------------------------------------------------


Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access(EXTRACT(MONTH FROM INTERNAL_FUNCTION("COL3"))=TO_NUMBER('MAY'))

SQL>
APC