Hi,
Does the order of the columns in a WHERE clause effect performance?
e.g.
Say I put a column that has a higher potential for uniqueness first or visa versa?
Hi,
Does the order of the columns in a WHERE clause effect performance?
e.g.
Say I put a column that has a higher potential for uniqueness first or visa versa?
With a decent query optimiser: it shouldn't.
But in practice, I suspect it might.
You can only tell for your cases by measuring. And the measurements will likely change as the distribution of data changes in the database.
If you are ANDing conditions the first not true will return false, so order can affect performance.
It all depends on the DBMS, query optimizer and rules, but generally it does affect performance.
If a where clause is ordered such that the first condition reduces the resultset significantly, the remaining conditions will only need to be evaluated for a smaller set. Following that logic, you can optimize a query based on condition order in a where clause.
For Transact-SQL there is a defined order of evaluation for the condition of the WHERE clause. The optimizer may be able to detect when the order may be rearranged and still be semantically equivalent, but I suspect that the transformations that it applies are relatively simple and it will be possible to construct a condition that performs suboptimially based on the ordering and grouping of the operators. Simplifying your search condition should improve the ability of the optimizer to handle it.
Ex:
WHERE (a OR b) AND (b OR c)
could be simplified to
WHERE b OR (a AND c)
Clearly in this case if the query can be constructed to find if b holds first it may be able to skip the evaluation of a and c and thus would run faster. Whether the optimizer can do this simple transformation I can't answer (it may be able to), but the point is that it probably can't do arbitrarily complex transformations and you may be able to effect query performance by rearranging your condition.
EDIT: With regard to your question about ordering based on uniqueness, I would assume that the any hints you can provide to the optimizer based on your knowledge (actual, not assumed) of the data couldn't hurt. Pretend that it won't do any optimization and construct your query as if you needed to define it from most to least selective, but don't obsess about it until performance is actually a problem.
Quoting from the reference above:
The order of precedence for the logical operators is NOT (highest), followed by AND, followed by OR. The order of evaluation at the same precedence level is from left to right. Parentheses can be used to override this order in a search condition. For more information about how the logical operators operate on truth values, see AND, OR, and NOT.
Unless I have missed something here, this question is not about the Query Optimizers interpretation of the precedence of logical operators but rather how the ordering of columns in the Where clause, based on selectivity, affects the query plan that is produced.
The query optimizer will determine the most efficient way to select the data you have requested, irrespective of the ordering of the SARGS defined in the WHERE clause.
The ordering is therefore determined by factors such as the selectivity of the column in question (which SQL Server knows based on statistics) and whether or not indexes can be used.
For SQL Server 2000 / 20005 / 2008, the query optimizer usually will give you identical results no matter how you arrange the columns in the WHERE clause. Having said this, over the years of writing thousands of T-SQL commands I have found a few corner cases where the order altered the performance. Here are some characteristics of the queries that appeared to be subject to this problem:
If you have a large number of tables in your query (10 or more).
If you have several EXISTS, IN, NOT EXISTS, or NOT IN statements in your WHERE clause
If you are using nested CTE's (common-table expressions) or a large number of CTE's.
If you have a large number of sub-queries in your FROM clause.
Here are some tips on trying to evaluate the best way to resolve the performance issue quickly:
If the problem is related to 1 or 2, then try reordering the WHERE clause and compare the sub-tree cost of the queries in the estimated query plans.
If the problem is related to 3 or 4, then try moving the sub-queries and CTE's out of the query and have them load temporary tables. The query plan optimizer is FAR more efficient at estimating query plans if you reduce the number of complex joins and sub-queries from the body of the T-SQL statement.
If you are using temporary tables, then make certain you have specified primary keys for the temporary tables. This means avoid using SELECT INTO FROM to generate the table. Instead, explicitly create the table and specify a primary KEY before using an INSERT INTO SELECT statement.
If you are using temporary tables and MANY processes on the server use temporary tables as well, then you may want to make a more permanent staging table that is truncated and reloaded during the query process. You are more likely to encounter disk contention issues if you are using the TempDB to store your working / staging tables.
Move the statements in the WHERE clause that will filter the most data to the beginning of the WHERE clause. Please note that if this is your solution to the problem, then you will probably have poor performance again down the line when the query plan gets confused again about generating and picking the best execution plan. You are BEST off finding a way to reduce the complexity of the query so that the order of the WHERE clause is no longer relevant.
I hope you find this information helpful. Good luck!