ansaurus

Question

Adding redudant join conditions in Oracle results in a different plan

Answer 1

+1 A:

Those two queries do not look the same to me at all.
Then again I am not the Oracle optimizer.

Since both B and C have foreign keys of A when you have done this

INNER JOIN B
ON B.id = A.id

the resultant resultset (say that a few times fast) on which you join table C in your second query is smaller, as you saw in your execution-plan results, than it is when you join table C solely to table A as in your first query.

So you've optimized things in the second query by joining table C on the smaller set of data that is the intersection of A and B, and the smaller set of data that is the intersection of A and C.

Adam Bernier 2009-06-14 15:52:46

Heartily disagree. Adding the second condition makes no difference to the actual result set sizes. As noted by others, it does change the optimizer's estimate of the result set sizes, thus the change in execution plan. It's wrong to say that he joined table C "solely to table A" in the first query; he joined A and B, then joined C to that result set. The join condition happened to reference only a column in A, but logically the ID columns in all three tables are required to be equal by the join conditions, so the results are the same regardless of which ones are used.

Dave Costa 2009-06-15 12:50:22

@Dave: Thank you for that thorough explanation. I feel that I've learned a lot from providing a poor answer to this question. Hopefully someone else can learn from this too.

Adam Bernier 2009-06-15 18:06:24

Answer 2

+2 A:

Oracle's optimizer doesn't make transitive assumptions about equality. Although we understand that if A = B and A = C, then B = C, Oracle doesn't assume there is a relationship between B & C unless there is one explicitly stated in the WHERE clause or JOIN conditions.

I'm assuming you have other constraints on A, B and/or C (as opposed to just selecting the entire contents of the tables - otherwise your I/O wouldn't be so low unless your tables were tiny, at which point optimization is somewhat moot). So there are really more constraints on A, B & C than what you specify. Oracle's optimizer will look at all tables in the FROM clause, list the constraints against them in the WHERE clause, and then determine the selectivity of the constraints based on the indexes for those tables. It will then go through various permutations of attack plans and determine which yields the most hope (those are the cardinality values you see in the plans). Without the B = C condition, it will exclude plans that start with B and proceed to C (or vice versa), and those may be the best plans possible.

Steve Broberg 2009-06-14 18:25:37

Yes, your assumption is correct. In fact, I couldn't even find the original query when I wrote this question, so I wrote a similar one including three tables with the said relationships. I have an inkling that the actual query produced even more dramatic differences. However, I did try (with the original query) to only include the B = C condition (but not the A = C) condition, and got similarly bad results. What was strange to me was that I had to have both A = C and B = C.

waxwing 2009-06-15 13:28:13

Answer 3

+2 A:

What you've got is two issues.

Firstly, with the original SQL, the optimizer makes an estimate about the number of rows in A with rows matching the ID in B which also have a matching row in C. The estimate is inaccurate, and the wrong plan is chosen.

Now, you add the redundant condition. Oracle assumes no conditions are truly redundant (as, if they were, an intelligent developer wouldn't include them). It also assumes that each condition is independent of the others. For example, a select where hair = 'bald' may get 10% of a table, a select where gender = 'F' may get 50%. Oracle would assume that a select where hair = 'bald' and gender = 'F' would give 5% (whereas in reality baldness is mostly limited to men).

By adding the 'redundant' predicate, Oracle will over-estimate the numbers or rows to be excluded and will choose the plan accordingly.

If, with the redundant predicate, Oracle is choosing a better plan, it suggests that the estimates for the original query over-estimated the number of rows matching. The redundant predicate is countering that with an under-estimate. And in this case, two wrongs are making a right.

It's not a solution I'd recommend, but if it works.....

PS. I'm assuming data types of all the IDs are consistent. If B.ID and C.ID are date and A.ID was character, or vice versa, then it is possible to have some rows where A.ID = B.ID and A.ID = C.ID but B.ID != C.ID, because implicit conversion may lose timestamps.

Gary 2009-06-14 23:06:09

I agree with you Gary: if the plan is better with redundant join conditions, it is because the statistics are inaccurate. In general, you should NOT provide redundant information.

Vincent Malgrat 2009-06-15 11:22:48

This is the most compelling answer to me, because it restores some hope in Oracle. (So yes, I am a little unfairly biased.) Whether it is the actual explanation is difficult for anyone to answer.

waxwing 2009-06-15 12:59:46

Answer 4

+3 A:

Interesting.

It seems that Oracle can infer this transitive equality only under some circumstances : They call it Transitive Closure and you should be able to benefit from it when query rewriting is enabled.

But to be on the safe side, better spell out the redundant predicate yourself.

Thilo 2009-06-15 06:54:30

Interesting article. Thanks!

waxwing 2009-06-15 15:36:35

ansaurus

tags:

views:

answers:

Adding redudant join conditions in Oracle results in a different plan

related questions