views:

1210

answers:

3

Database

Table1
 Id
 Table2Id

...

Table2
  Id
  StartTime
  Duration  //in hours

Query

select * from Table1 join Table2 on Table2Id = Table2.Id 
where starttime < :starttime and starttime + Duration/24 > :endtime

This query is currently taking about 2 seconds to run which is too long. There is an index on the id columns and a function index on Start_time+duration/24 In Sql Developer the query plan shows no indexes being used. The query returns 475 rows for my test start and end times. Table2 has ~800k rows Table1 has ~200k rows

If the duration/24 calculation is removed from the query, replaced with a static value the query time is reduced by half. This does not retrieve the exact same data, but leads me to believe that the division is expensive.

I have also tested adding an endtime column to Table2 that is populated with (starttime + duration/24) The column was prepopulated via a single update, if it would be used in production I would populate it via an update trigger.

select * from Table1 join Table2 on Table2Id = Table2.Id 
where starttime < :starttime and endtime > :endtime

This query will run in about 600ms and it uses an index for the join. It is less then ideal because of the additional column with redundant data.

Are there any methods of making this query faster?

+1  A: 

Oracle would not use indexes if the selectivity of the where clause is not very good. Index would be used if the number of rows returned would be some percentage of the total number of rows in the table (the percentage varies, since oracle will count the cost of reading the index as well as reading the tables).

Also, when the index columns are modified in where clause, the index would get disabled. For example, UPPERCASE(some_index_column), would disable the usage of the index on some_index_column. This is why starttime + Duration/24 > :endtime does not use the Index.

Can you try this

select * from Table1 join Table2 on Table1.Id = Table2.Table1Id 
where starttime < :starttime and starttime  > :endtime - Duration/24

This should allow the use of the Index and there is no need for an additional column.

Sathya
that knocks 150-200ms off the query time, and uses the index for the join. Better but not great.
Darryl Braaten
-1 because of the 5% remark and the "I feel" without testing, but mostly the 5%, which is nonsense.
Rob van Wijk
May I know what you think 5% is nonsense ? Literature suggests that the selectivity should be around 5% to 20%. The exact value depends on the database. Look at this post, where 15% selectivity is mentioned. http://stackoverflow.com/questions/212264/how-to-choose-and-optimize-oracle-indexes
Sathya
Every query has its own percentage where an index would be favoured above not using the index. And that percentage varies between 0 and 100%, depending on relative size of the index, clustering factor, selectivity and much more. It's certainly not a fixed number, like it sounded in your original message.
Rob van Wijk
@Rob, edited to reflect your comments.
Mark Harrison
+3  A: 

Create a function index on both starttime and the expression starttime + Duration/24:

create index myindex on table2(starttime, starttime + Duration / 24);

A compound index on the entire predicate of your query should be selected, whereas individually indexed the optimizer is likely deciding that repeated table accesses by rowid based on a scan of one of those indexes is actually slower than a full table scan.

Also make sure that you're not doing an implicit conversion from varchar to date, by ensuring that you're passing DATEs in your bind variables.

Try lowering the optimizer_index_cost_adj system parameter. I believe the default is 100. Try setting that to 10 and see if your index is selected.

Consider partitioning the table by starttime.

Apocalisp
As I mentioned in the question that index already exists. Dates are being passed in as dates on the parametrized query. I will look at the optimizer_index_cost_adj
Darryl Braaten
Does that compound index exist?
Apocalisp
We would expect that a composite function-based index (as suggested by Apocalisp) would be an ideal candidate for the query. (All the normal suggestions apply here: statistics up-to-date, EXPLAIN PLAN, SQL*Plus AUTOTRACE, event 10046 trace, event 10053 trace.) Good tip on the possible implicit data conversion. (I normally pass all bind arguments as strings, and do the explicit conversion in the statement ... startime >= TO_DATE(:b1,'YYYYMMDDHH24MISS') ... rather than leave it up to Oracle. (SQL Developer is showing no index used, is SQL Developer seeing the bind args as DATE datatype?
spencer7593
SQL Developer is just using TO_DATE on the strings. Setting optimizer_index_cost_adj to 5 brings the query down to 160ms which is very good. Now I need to find out the repercussions of making this change at the database level. For testing I just set it at the session, but I don't think the ORM we are using allows changing session values.
Darryl Braaten
Marking this as accepted for the optimizer_index_cost_adj suggestion.
Darryl Braaten
@Apocalisp: you might also suggest an index on the foreign key column referenced by the join condition ( ... on table1(table2id) ...), as well as including the id column in the (suggested) composite index on table2 (startime,startime+duration/24,id) ... but that's probably only going to help if there were no additional columns in table2 that had to be retrieved (to return in the result set)
spencer7593
@Darryl: have you tried a hint on the statement? That's less radical than changing an instance wide parameter. (Yes, the instance parameter should be set to an appropriate value.)SELECT /*+ INDEX(table2 myindex) */
spencer7593
optimizer_index_cost_adj - watch out adjusting this, according to jonathan lewis you really shouldn't need to change this if you are gathering system stats, my own testing has backed this up. If you change this value at the database level, it will likely effect many queries, not just this one.
Matthew Watson
@Matthew: yes, it almost goes without saying, changing an instance parameter to improve performance of one query is likely going to have unintended, detrimental consequences to other operations. Statistics up-to-date with DBMS_STATS.gather_system_stats, .gather_schema_stats,
spencer7593
+1  A: 

You have two criteria with range predicates (greater than/less than). An index range scan can start at one point in the index and end at another.

For a compound index on starttime and "Starttime+duration/24", since the leading column is starttime and the predicate is "less than bind value", it will start at the left most edge of the index (earliest starttime) and range scan all rows up to the point where the starttime reaches the limit. For each of those matches, it can evaluate the calculated value for "Starttime+duration/24" on the index against the bind value and pass or reject the row. I'd suspect most of the data in the table is old, so most entries have an old starttime and you'd end up scanning most of the index.

For a compound index on "Starttime+duration/24" and starttime, since the leading column is the function and the predicate is "greater than bindvalue", it will start partway through the index and work its way to the end. For each of those matches, it can evaluate the starttime on the index against the bind value and pass or reject the row. If the enddate passed in is recent, I suspect this would actually involve a much smaller amount of the index being scanned.

Even without the starttime as a second column on the index, the existing function based index on "Starttime+duration/24" should still be useful and used. Check the explain plan to make sure the bindvalue is either a date or converted to a date. If it is converted, make sure the appropriate format mask is used (eg an entered value of '1/Jun/09' may be converted to year 0009, so Oracle will see the condition as very relaxed and would tend not to use the index - plus the result could be wrong).

"In Sql Developer the query plan shows no indexes being used. " If the index wasn't being used to find the table2 rows, I suspect the optimizer thought most/all of table2 would be returned [which it obviously isn't, by your numbers]. I'd guess that it though most of table1 would be returned, and thus neither of your predicates did a lot of filtering. As I said above, I think the "less than" predicate isn't selective, but the "greater than" should be. Look at the explain plan, especially the ROWS value, to see what Oracle thinks

PS. Adjusting the value means the optimizer changes the basis for its estimates. If a journey planner says you'll take six hours for a trip because it assumes an average speed of 50, if you tell it to assume an average of 100 it will comes out with three hours. it won't actually affect the speed you travel at, or how long it takes to actually make the journey. So you only want to change that value to make it more accurately reflect the actual value for your database (or session).

Gary