ansaurus

Question

Oracle explain plan estimates incorrect cardinality for an index range scan

Answer 1

+2 A:

The cardinality estimate would be based on the product of the selectivity of a.key1 and a.key2, which (at least in 10g) would each be based on the number of distinct values for those two columns as recorded in the column statistics.

For a table of 5M rows, a cardinality estimate of 406 is not significantly different to 7366. The question you have to ask yourself is, is the "inaccurate" estimate here causing a problem?

You can check what plan Oracle would choose if it were able to generate a perfectly accurate estimate by getting an explain plan for this:

select /*+CARDINALITY(a 7366)*/ count(a.id) 
from LARGE_PARTITIONED_TABLE a
join SMALL_NONPARTITIONED_TABLE b on a.key1 = b.key1 and a.key2 = b.key2
where b.id = 1000;

If this comes up with the same plan, then the estimate that Oracle is calculating is already adequate.

Jeffrey Kemp 2010-07-01 11:58:11

Thanks Jeffrey. You are of course correct in that the cardinality estimate difference in this example is not significant. However, this is just a simplified case I prepared for this question - I have other cases of queries against same tables where the difference is much bigger - and presumably more significant as well.I am still unable to come up with the formula Oracle uses to arrive at cardinality of 406. Num_distinct for columns a and b are 116 and 650, respectively.

Tommi 2010-07-01 12:39:03

Answer 2

+1 A:

You might be interested to read this excellent paper by Wolfgang Breitling which has a lot of info on CBO calculations: http://www.centrexcc.com/A%20Look%20under%20the%20Hood%20of%20CBO%20-%20the%2010053%20Event.pdf.

As explained there, because you have histograms, the filter-factor calculation for these columns does not use number of distinct values (NDV) but density, which is derived from the histogram in some way.

What are the DENSITY values in USER_TAB_COLUMNS for a.key1 and a.key2?

Generally, the problem in cases like this is that Oracle does not gather statistics on pairs of columns, and assumes that their combined filter factor will be the product of their individual factors. This will produce low estimates if there is any correlation between values of the two columns.

If this is causing a serious performance issue, I suppose you could create a function-based index on a function of those columns, and use that to do the lookup. Then Oracle would gather statistics on that index and probably produce better estimates.

Dave Costa 2010-07-01 13:20:16

Thanks Dave, that paper looks like an interesting read indeed. I need to digest this a bit - I'll post the results tomorrow!

Tommi 2010-07-01 13:58:40

I ran a 10053 trace on the query - please see the updated question. Density values for a.key1 and a.key2 are 0,000000111 and 0,0022831.

Tommi 2010-07-02 06:35:25

Answer 3

+4 A:

Generating a 10053 trace file will help show exactly what choices the optimizer's making regarding its estimation of cardinality and selectivity. Jonathan Lewis' excellect Cost Based Oracle Fundamentals is an excellent resource to understanding how the optimizer works, and the printing I have spans 8i to 10.1.

From that work:

Join Selectivity =   ((num_rows(t1) - num_nulls(t1.c1)) / num_rows(t1)) 
                   * ((num_rows(t2) - num_nulls(t2.c2)) / num_rows(t2))
                   / greater (num_distinct(t1.c1), num_distinct(t2.c2))

Join Cardinality =   Join Selectivity 
                   * filtered_cardinality (t1)
                   * filtered_cardinality (t2)

However, because we have a multi-column join, Join Selectivity isn't at the table level, it's the product (intersection) of the join selectivities on each column. Assuming there's no nulls in play:

Join Selectivity = Join Selectivity (key1) * Join Selectivity (key2)

Join Selectivity (key1) =   ((5,000,000 - 0) / 5,000,000)
                          * ((10,000 - 0)) / 10,000)
                          / max (116, ?)  -- distinct key1 values in B

                        = 1 / max(116, distinct_key1_values_in_B)

Join Selectivity (key2) =   ((5,000,000 - 0) / 5,000,000)
                          * ((10,000 - 0)) / 10,000)
                          / max (650, ?)  -- distinct key2 values in B

                        = 1 / max(650, distinct_key2_values in B)

Join Cardinality =  JS(key1) * JS(key2) 
                  * Filter_cardinality(a) * Filter_cardinality(b)

We know that there are no filters on A, so that's tables filter cardinality is the number of rows. We're selecting the key value from B, so that table's filter cardinality is 1.

So the best case for estimated estimated join cardinality is now

Join Cardinality  = 1/116 * 1/650 * 5,000,000 * 1

                  =~ 67

It might be easier to work backward. Your estimated cardinality of 406, given what we know, leads to a join selectivty of 406/5,000,000, or approximately 1/12315. That happens to be really, really close to 1 / (116^2), which is a sanity check within the optimizer to prevent it from finding too aggressive a cardinality on multi-column joins.

For the TL;DR crowd:

Get Jonathan Lewis' Cost Based Oracle Fundamentals.
Get a 10053 trace of the query whose behavior your can't understand.

Adam Musch 2010-07-01 14:13:59

Thanks Adam. I will certainly have a look at the 10053 trace of the query. Will post my results tomorrow.

Tommi 2010-07-01 16:43:52

I have now updated the question with the 10053 trace, please see above.

Tommi 2010-07-02 06:31:03

I can't answer the second question -- Oracle's got lots of fudge factors, and the concatentated index cardinality rules aren't well documented. Besides, it's a factor that cancels itself out.The A-B join cardinality is based on the relative cardinality of A (1 row) multiplied by the absolute cardinality of B (4897354) multiplied by the join selectivity -- which is 1 / absolute cardinality of A.Look at it this way. Your tables map 12046 rows in A to 4897354 rows in B. Therefore for each row in A, there are, on average, 406 rows in B.

Adam Musch 2010-07-02 14:15:02

You also may wish to look at metalink note 728004.1 "How ORACLE CBO Calculate Join Cardinality with Join Elimination Filter"

Adam Musch 2010-07-02 14:37:36

I accepted this answer because it did help me figure out where the estimated cardinality came from - thanks Adam! What I'm still puzzled with is how to improve query performance in a case like this.

Tommi 2010-07-02 17:39:48

ansaurus

tags:

views:

answers:

Oracle explain plan estimates incorrect cardinality for an index range scan

related questions