I have an Oracle 10.2.0.3 database, and a query like this:
select count(a.id)
from LARGE_PARTITIONED_TABLE a
join SMALL_NONPARTITIONED_TABLE b on a.key1 = b.key1 and a.key2 = b.key2
where b.id = 1000
Table LARGE_PARTITIONED_TABLE (a) has about 5 million rows, and is partitioned by a column not present in the query. Table SMALL_NONPARTITIONED_TABLE (b) is not partitioned, and holds about 10000 rows.
Statistics are up-to-date, and there are height balanced histograms in columns key1 and key2 of table a.
Table a has a primary key and a global, nonpartitioned unique index on columns key1, key2, key3, key4, and key5.
Explain plan for the query displays the following results:
---------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 31 | 4 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 31 | | |
| 2 | NESTED LOOPS | | 406 | 12586 | 4 (0)| 00:00:01 |
|* 3 | INDEX RANGE SCAN| INDEX_ON_TABLE_B | 1 | 19 | 2 (0)| 00:00:01 |
|* 4 | INDEX RANGE SCAN| PRIMARY_KEY_INDEX_OF_TABLE_A | 406 | 4872 | 2 (0)| 00:00:01 |
---------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - access("b"."id"=1000)
4 - access("a"."key1"="b"."key1" and
"a"."key2"="b"."key2")
Thus the rows (cardinality) estimated for step 4 is 406.
Now, a tkprof trace reveals the following:
Rows Row Source Operation
------- ---------------------------------------------------
1 SORT AGGREGATE (cr=51 pr=9 pw=0 time=74674 us)
7366 NESTED LOOPS (cr=51 pr=9 pw=0 time=824941 us)
1 INDEX RANGE SCAN INDEX_ON_TABLE_B (cr=2 pr=0 pw=0 time=36 us)(object id 111111)
7366 INDEX RANGE SCAN PRIMARY_KEY_INDEX_OF_TABLE_A (cr=49 pr=9 pw=0 time=810173 us)(object id 222222)
So the cardinality in reality was 7366, not 406!
My question is this: From where does Oracle get the estimated cardinality of 406 in this case, and how can I improve its accuracy, so that the estimate is more in line of what really happens during query execution?
Update: Here is a snippet of a 10053 trace I ran on the query.
NL Join
Outer table: Card: 1.00 Cost: 2.00 Resp: 2.00 Degree: 1 Bytes: 19
Inner table: LARGE_PARTITIONED_TABLE Alias: a
...
Access Path: index (IndexOnly)
Index: PRIMARY_KEY_INDEX_OF_TABLE_A
resc_io: 2.00 resc_cpu: 27093
ix_sel: 1.3263e-005 ix_sel_with_filters: 1.3263e-005
NL Join (ordered): Cost: 4.00 Resp: 4.00 Degree: 1
Cost_io: 4.00 Cost_cpu: 41536
Resp_io: 4.00 Resp_cpu: 41536
****** trying bitmap/domain indexes ******
Best NL cost: 4.00
resc: 4.00 resc_io: 4.00 resc_cpu: 41536
resp: 4.00 resp_io: 4.00 resp_cpu: 41536
Using concatenated index cardinality for table SMALL_NONPARTITIONED_TABLE
Revised join sel: 8.2891-e005 = 8.4475e-005 * (1/12064.00) * (1/8.4475e-005)
Join Card: 405.95 = outer (1.00) * inner (4897354.00) * sel (8.2891-e005)
Join Card - Rounded: 406 Computed: 405.95
So that is where the value 406 is coming from. Like Adam answered, join cardinality is join selectivity * filter cardinality (a) * filter cardinality (b)
, as can be seen on the second to last line of above trace quote.
What I do not understand is the Revised join sel
line. 1/12064 is the selectivity of the index used to find the row from table b (12064 rows on table, and select based on unique id). But so what?
Cardinality appears to be calculated by multiplying the filter cardinality of table b (4897354) with selectivity of table a (1/12064). Why? What does the selectivity on table a have to do with how much rows is expected to be found from table b, when a --> b join is not based on a.id?
Where does the number 8.4475e-005 come from (it does not appear anywhere else in the whole trace)? Not that it affects the output, but I'd still like to know.
I understand that the optimizer has likely chosen the correct path here. But still the cardinality is miscalculated - and that can have a major effect on the execution path that is chosen from that point onwards (as in the case I'm having IRL - this example is a simplification of that).