views:

420

answers:

6

From a performance standpoint, does the order of my SQL WHERE statements make a difference?

For instance

SELECT ... FROM ...
WHERE a > 1
AND b < 2

Would that be any faster/slower than

SELECT ... FROM ...
WHERE b < 2
AND a > 1

Let's also assume that I know in advance that a > 1 will narrow the result set the most.

Also, does it matter if I'm joining two or more tables the order of my WHERE statements?

+2  A: 

No. The optimiser decides which order to filter results based upon current statistics.

Mitch Wheat
Is that true even if I know one clause will drastically reduce/filter the resultset more than another clause. Meaning, what happens if I know "a > 1" will drastically reduce the resultset moreso than "b < 2". Does the order then matter?
No - if you know it, the optimizer should too, and will probably use the more selective condition first. It depends on whether there are indexes to help, and whether a and b are in the same table, and ...
Jonathan Leffler
A: 

If it's from the same table, and the query is as simple as your example then, no it doesn't make a difference. As you get more complicated and link more tables, it can.

Zenshai
So what, in order, should come first? The filter than narrows down the result set the most - first ... or last in the order of the WHERE clause?
+11  A: 

In theory, there is no difference.

Occasionally, especially with the simpler optimizers, there are differences in the query plan depending on the order of the clauses in the WHERE clause. There's a moderately strong argument that such differences are symptomatic of a bug.

Similar comments apply to join order, too. The order of the joins should not matter - for joins of the same type. Clearly, whether a table Table2 is inner joined or outer joined to another table Table1 does matter - and it matters whether it is Table1 LEFT JOIN Table2 or Table1 RIGHT JOIN Table2 or Table1 FULL JOIN Table2. But for a series of INNER JOIN operations, the sequencing should not matter. The processing order may be forced, to some extent, if you are dealing with a chain of joins.

Clarifying (again) - consider:

(Table1 AS t1 JOIN Table2 AS t2 ON t1.pkcol = t2.fkcol) AS j1
JOIN
(Table3 AS t3 JOIN Table4 AS t4 ON t3.pkcol = t4.fkcol) AS j2
ON j1.somecol = j2.anothercol

The way it is written, clearly the programmer expects the joins on (t1, t2) and (t3, t4) to be executed before the join on (j1, j2), but the optimizer may be able to do the joins differently. For example, if j1.somecol derives from Table1 and j2.anothercol derives from Table4, the optimizer may be able to choose the join on Table1.SomeCol = Table4.AnotherCol over either of the other joins. This sort of issue can be influenced by the filter conditions in the WHERE clause, and by the presence or absence of appropriate indexes on the various tables. This is where statistics can play a big part in the way the optimizer generates the query plan.

Jonathan Leffler
+1 for the details, explaining query optimization in a paragraph or two is a pretty tall order.
rick schott
Not disagreeing, just sharing an unusual example... We ran into a scenario where a third-party encryption routine used in SQL 2000 would severely impact the performance of a query if we included it as the first expression in a where clause as opposed to the last expression. When first, decryption was applied against the entire set of records in the table. When last, decryption was applied against a smaller subset of records in the table. The queries in question went from excessive (30 sec, 3 min, etc.) to acceptable (1-2 seconds).
Mayo
@Mayo: interesting example...ideally, there would be a way for 'you' (the providers of the 3rd party encryption routine) to tell the optimizer that 'this function costs an arm and a leg; invoke when necessary, but don't invoke until you know it is necessary'. If, as I expect, the optimizer simply assumed "it's a function; it costs as much as LENGTH() or some other function", then it could make misguided judgements about the sequence. What you highlight is that if a cost-based optimizer does not have a good idea of the cost of something, it will sometimes make lousy decisions. Thanks for this!
Jonathan Leffler
+7  A: 

No, it doesn't. Most modern SQL servers include a query optimizer which looks into all the plausible (*) ways of resolving a query and whereby older servers may take hints based on the order within the SELECT clause, newer servers do not.

The order of the JOINs on the other hand still matter to a greater extent.

Edit: Do see Jonathan's Leffler's response for he provides additional detail in particular regarding the order of JOINs. Thanks you, Jonathan!

Edit: ( * ) Plausible vs. Possible: As pointed out by Erikkalen, the optimizer does not look into all of the possible ways, thanks to [pretty good] heuristics coded in its logic, it will only evaluate the plausible plans, on the basis of the statistics it keeps for the underlying indexes. For each of the plans it considers an overall cost is estimated (or partially so, when partial costs readily exceed the overall cost of another plan [pruning]), and that's how the plan effectively used is eventually selected. While the general principles used by SQL query optimizers are well known, the intricacies of their implementation introduce many different twists-and-turns.

mjv
The order of the joins should not matter - for joins of the same type. Clearly, whether a table Table2 is inner joined or outer joined to another table Table1 does matter - and it matters whether it is Table1 LEFT JOIN Table2 or Table2 LEFT JOIN Table1. But for INNER JOIN, the sequencing should not matter. The processing order may be forced, to some extent, if you are dealing with a star join, for example.
Jonathan Leffler
@jonathan, can you say that again ... you lost me a little (but it sounds very interesting)
See the added material to my answer :D
Jonathan Leffler
No, it will not look in to "all of the possible ways" of resolving a query since that would be O(n!). It will take into account some of them and hopefully the best ones. You might, by sheer luck, get a faster execution plan by reordering items in the where clause.
erikkallen
@erikkalen. Agreed on the _all possible ways_, see my edit. Now, as to moving things in the input query in order try one's "sheer luck" at outsmarting the optimizer, I can't dismiss it, but the odds of such a thing are rather low... A better way to respectfully [;-)] help the optimizer, if the SQL implementation allows it, is by way of the explicit HINT construct.
mjv
+5  A: 

See below and follow the link(long article but worth the read):

SQL Server Transact-SQL WHERE

If a WHERE clause includes multiple expressions, there is generally no performance benefit gained by ordering the various expressions in any particular order. This is because the SQL Server Query Optimizer does this for you, saving you the effort. There are a few exceptions to this, which are discussed on this web site. [7.0, 2000, 2005] Added 1-24-2006

rick schott
A: 

It depends on the DBMS. SQL itself does not say anything about how a query should execute. It is up to the specific implementation.

If your DBMS had the very simplistic model of interpreting the query sequentially, then putting a > 1 first in your example would (obviously) be faster - because the DBMS would make two passes of which the second pass is through a much smaller resultset.

Dawie Strauss