views:

765

answers:

7

I am building a view in SQL Server 2000 (and 2005) and I've noticed that the order of the join statements greatly affects the execution plan and speed of the query.

select   sr.WTSASessionRangeID,
   -- bunch of other columns
from   WTSAVW_UserSessionRange us
inner join  WTSA_SessionRange sr on sr.WTSASessionRangeID = us.WTSASessionRangeID
left outer join WTSA_SessionRangeTutor srt on srt.WTSASessionRangeID = sr.WTSASessionRangeID
left outer join WTSA_SessionRangeClass src on src.WTSASessionRangeID = sr.WTSASessionRangeID
left outer join WTSA_SessionRangeStream srs on srs.WTSASessionRangeID = sr.WTSASessionRangeID
--left outer join MO_Stream ms on ms.MOStreamID = srs.MOStreamID
left outer join WTSA_SessionRangeEnrolmentPeriod srep on srep.WTSASessionRangeID = sr.WTSASessionRangeID
left outer join WTSA_SessionRangeStudent stsd on stsd.WTSASessionRangeID = sr.WTSASessionRangeID
left outer join WTSA_SessionSubrange ssr on ssr.WTSASessionRangeID = sr.WTSASessionRangeID
left outer join WTSA_SessionSubrangeRoom ssrr on ssrr.WTSASessionSubrangeID = ssr.WTSASessionSubrangeID
left outer join MO_Stream ms on ms.MOStreamID = srs.MOStreamID

On SQL Server 2000, the query above consistently generates a plan of cost 946. If I uncomment the MO_Stream join in the middle of the query and comment out the one at the bottom, the cost drops to 263. The execution speed drops accordingly. I always thought that the query optimizer would interpret the query appropriately without considering join order, but it seems that order matters.

So since order does seem to matter, is there a join strategy I should be following for writing faster queries?

(Incidentally, on SQL Server 2005, with almost identical data, the query plan costs were 0.675 and 0.631 respectively.)

Edit: On SQL Server 2000, here are the profiled stats:

  • 946-cost query: 9094ms CPU, 5121 reads, 0 writes, 10123ms duration
  • 263-cost query: 172ms CPU, 7477 reads, 0 writes, 170ms duration

Edit: Here is the logical structure of the tables.

SessionRange ---+--- SessionRangeTutor
                |--- SessionRangeClass
                |--- SessionRangeStream --- MO_Stream
                |--- SessionRangeEnrolmentPeriod
                |--- SessionRangeStudent
                +----SessionSubrange --- SessionSubrangeRoom

Edit: Thanks to Alex and gbn for pointing me in the right direction. I also found this question.

Here's the new query:

select sr.WTSASessionRangeID    // + lots of columns

from WTSAVW_UserSessionRange us
inner join WTSA_SessionRange sr on sr.WTSASessionRangeID = us.WTSASessionRangeID
left outer join WTSA_SessionRangeTutor srt on srt.WTSASessionRangeID = sr.WTSASessionRangeID
left outer join WTSA_SessionRangeClass src on src.WTSASessionRangeID = sr.WTSASessionRangeID
left outer join WTSA_SessionRangeEnrolmentPeriod srep on srep.WTSASessionRangeID = sr.WTSASessionRangeID
left outer join WTSA_SessionRangeStudent stsd on stsd.WTSASessionRangeID = sr.WTSASessionRangeID

// SessionRangeStream is a many-to-many mapping table between SessionRange and MO_Stream
left outer join (
    WTSA_SessionRangeStream srs
    inner join MO_Stream ms on ms.MOStreamID = srs.MOStreamID
) on srs.WTSASessionRangeID = sr.WTSASessionRangeID

// SessionRanges MAY have Subranges and Subranges MAY have Rooms
left outer join (
    WTSA_SessionSubrange ssr    
    left outer join WTSA_SessionSubrangeRoom ssrr on ssrr.WTSASessionSubrangeID = ssr.WTSASessionSubrangeID
) on ssr.WTSASessionRangeID = sr.WTSASessionRangeID

SQLServer2000 cost: 24.9

+2  A: 

Obviously, the SQL Server 2005 optimizer is a lot better than the SQL Server 2000 one.

However, there's a lot of truth in your question. Outer joins will cause execution to vary wildly based on order (inner joins tend to be optimized to the most efficient route, but again, order matters). If you think about it, as you build up left joins, you need to figure out what the heck is on the left. As such, each join must be calculated before every other join can be done. It becomes sequential, and not parallel. Now, obviously, there are things you can do to combat this (such as indexes, views, etc). But, the point stands: The table needs to know what's on the left before it can do a left outer join. And if you just keep adding joins, you're getting more and more abstraction to what, exactly is on the left (especially if you use joined tables as the left table!).

With inner joins, however, you can parallelize those quite a bit, so there's less of a dramatic difference as far as order's concerned.

Eric
Thanks for the input Eric - I'm going to do some more re-arranging to see if I can get the cost down further. There are indexes on all the columns used for joining. Unfortunately the nature of the data means that I have to use outer joins for this one.
geofftnz
"Obviously, the SQL Server 2005 optimizer is a lot better than the SQL Server 2000 one."That's an understatement. Not only is the SQL Server 2000 query optimizer feeble (along with the rest of the 2000 edition, in my opinion), it often seems to over-optimize, making false assumptions that result in hilariously incorrect data. A union of views, or a view containing a union, or a view containing a view (!) can trigger this idiocy once you cross a certain complexity threshold. I've never been able to pinpoint the nature of the problem exactly, but I've often been bitten by it.
WCWedin
+1  A: 

it depends on which of the join fields are indexed - if it has to table scan the first field, but use an index on the second, it's slow. If your first join field is an index, it'll be quicker. My guess is that 2005 optimizes it better by determining the indexed fields and performing those first

Luke Schafer
+2  A: 

A general strategy for optimizing queries containing JOINs is to look at your data model and the data and try to determine which JOINs will reduce number of records that must be considered the most quickly. The fewer records that must be considered, the faster the query will run. The server will generally produce a better query plan too.

Along with the above optimization make sure that any fields used in JOINs are indexed

Jeff Leonard
+1  A: 

At DevConnections a few years ago a session on SQL Server performance stated that (a) order of outer joins DOES matter, and (b) when a query has a lot of joins, it will not look at all of them before making a determination on a plan. If you know you have joins that will help speed up a query, they should be early on in the FROM list (if you can).

Joe
+4  A: 

I have to disagree with all previous answers, and the reason is simple: if you change the order of your left join, your queries are logically different and as such they produce different result sets. See for yourself:

SELECT 1 AS a INTO #t1
UNION ALL SELECT 2
UNION ALL SELECT 3
UNION ALL SELECT 4;

SELECT 1 AS b INTO #t2
UNION ALL SELECT 2;

SELECT 1 AS c INTO #t3
UNION ALL SELECT 3;

SELECT a, b, c 
FROM #t1 LEFT JOIN #t2 ON #t1.a=#t2.b
  LEFT JOIN #t3 ON #t2.b=#t3.c
ORDER BY a;

SELECT a, b, c 
FROM #t1 LEFT JOIN #t3 ON #t1.a=#t3.c
  LEFT JOIN #t2 ON #t3.c=#t2.b
ORDER BY a;

a           b           c
----------- ----------- -----------
1           1           1
2           2           NULL
3           NULL        NULL
4           NULL        NULL

(4 row(s) affected)

a           b           c
----------- ----------- -----------
1           1           1
2           NULL        NULL
3           NULL        3
4           NULL        NULL
AlexKuznetsov
It depends on the structure of the tables. You're correct for the T1-T2, T2-T3 case. In my situation it's T1-T2, T1-T3.
geofftnz
@geofftnz: see my answer. You are *not* T1-T2, T1-T3
gbn
+1  A: 

You query is probably wrong anyway. Alex is correct. Eric may be correct too, but the query is wrong.

Lets' take this subset:

WTSA_SessionRange sr
left outer join
WTSA_SessionSubrange ssr on ssr.WTSASessionRangeID = sr.WTSASessionRangeID
left outer join
WTSA_SessionSubrangeRoom ssrr on ssrr.WTSASessionSubrangeID = ssr.WTSASessionSubrangeID

You are joining WTSA_SessionSubrangeRoom onto WTSA_SessionSubrange. You may have no rows from WTSA_SessionSubrange.

The join should be this:

WTSA_SessionRange sr
left outer join
(SELECT WTSASessionRangeID, columns I need
FROM
    WTSA_SessionSubrange ssr
    left outer join
    WTSA_SessionSubrangeRoom ssrr on ssrr.WTSASessionSubrangeID = ssr.WTSASessionSubrangeID
) foo on foo.WTSASessionRangeID = sr.WTSASessionRangeID

This is why the join order is affecting results because it's a different query, declaratively speaking.

You'd also need to change the MO_Stream and WTSA_SessionRangeStream join too.

gbn
What situations would result in these two queries returning different results?
geofftnz
Exactly as Alex demonstrated...
gbn
But in the T1-T2 T2-T3 situation, if there was a foreign key relationship between T2 and T3 such that a row in T3 cannot exist without referencing a row in T2, does it still matter?
geofftnz
yes. It's because of the value-null-null or value-value-null type situation.
gbn
(Horrible sinking feeling about existing code) I've reformatted my query and added it to the question. Is this logically the same as your subquery?
geofftnz
+1  A: 

The join order does make a difference to the resulting query. This is documented in BOL in the docs for FROM:

<joined_table>

Is a result set that is the product of two or more tables. For multiple joins, use parentheses to change the natural order of the joins.

You can alter the join order using parenthesis around the joins (BOL does show this in the syntax at the top of the docs, but it is easy to miss).

This is known as chiastic behaviour. You can also use the query hint OPTION (FORCE ORDER) to force a specific join order, but this can result in what are called "bushy plans" which may not be the most optimal for the query being executed.

adrianbanks