tags:

views:

129

answers:

3

I have a super-simple query in a star schema. One fact, two dimensions. I have verified that I am doing the joins properly. But when I execute the query plan, I get a merge cartesian join right before the steps to add the dimensions.

explain plan for
select * from fact f
inner join dim1 d1
   on d1.id = f.d1_id
inner join dim2 d2
   on d2.id = f.d2_id
where d1.code = 'A' and d2.code = 'B';

If I change to search by the dimension ID instead of the code, my plan is fine - no cartesian.

explain plan for
select * from fact f
inner join dim1 d1
   on d1.id = f.d1_id
inner join dim2 d2
   on d2.id = f.d2_id
where d1.id= '1' and d2.id = '2';

Any ideas what could cause the cartesian to happen?

EDIT: I just created the tables and indexes today. I will verify that I did "compute statistics" on them to be sure all is up-to-date.

more information on the tables now that I have edited them and got rid of the cartesian:

Fact table:

bitmap index on dim1.id

bitmap index on dim2.id

(and lots more bitmap indexes)

Dim1

unique index on id

bitmap index on code --this is new, but it didn't seem to change the query plan any.

Dim2

unique index on id

unique index on code --when I added this, the cartesian went away.

My fact table has 50 million records, dim1 has 44 records, and dim2 has 6 records. So I didn't originally have indexes on such short tables. But adding the unique index to dim2 is what got rid of the cartesian join and dropped the query plan time estimate from 5 minutes to a few seconds.

+2  A: 

It seems that "ID" is much more selective than "code", so the optimizer decides to apply the conditions after the join. Try if adding indexes on code (if possible, unique ones) changes anything and gives you quicker results.

IronGoofy
thanks! I was unable to make the dim1.code unique, but I was able to make dim2.code unique. That fixed it. thanks for the idea! But it seems this could cause me problems eventually. Not all my dims have anything unique except the ID.
sql_mommy
A: 

There are a couple of things to lok at here.

Firstly, how do the execution plans compare in terms of estimated cardinalities? This can be a very significant driver of plan changes. If your statistics are out of date (try using dynamic sampling) then the cardinalities of the dimension result sets could be incorrectly very low or very high.

Secondly, in the latter query Oracle is probably using transitivity to apply those predicates directly to the fact table in which case it might have a much more accurate estimation of the size of the result set from that table (especialy with dynamic sampling or multicolumn statistics in 11g).

The explain plans are critical to the analysis though.

David Aldridge
A: 

Sorry I can't comment, I don't have the rep.

Can you post the table statistics and definitions of the tables in question (including indexes)?

StevenWilkins