views:

199

answers:

2

I use a view based on a complex query with 17 joins (both inner and left/right outer) and subqueries. All view rows are shown in about 5 seconds.

SELECT * FROM a_view;

One of the view columns has type BIT. And when I filter view rows comparing it with 1, a query works again about 5 seconds.

SELECT * FROM a_view WHERE c = 1;

But when I compare this BIT column with 0, a query works about 50 seconds (10 times slower).

SELECT * FROM a_view WHERE c = 0;

This query which returns the same result rows works as expected about 10 seconds:

SELECT * FROM a_view 
EXCEPT
SELECT * FROM a_view WHERE c = 1;

So I wonder why comparing with 0 or 'FALSE' takes so much time? Any ideas, please.

Sorting on this BIT field is fast. Filtering by other columns is fast too.

+1  A: 

the SQL Server sql engine places the whole SQL query of the view inside the SQL statement you wrote on the view and then tries to optimize it.

This could lead to a situation where with c=0, the statistics of the tables used show that there are much more rows matching that predicate than with c=1. For example, with c=1, a table which contains the c field which is the center of the joins might return just 5 matching rows, which leads to a much different execution plan than if the table returns 1 million rows (which is for example the situation for c=0).

So examine the execution plans for both. Also examine the server profiler results for both, as with c=0, it might be there are much more reads than with c=1, and much more results being returned than with c=1. Returning all rows might take a while so this too might be the reason the query is slower.

Frans Bouma
+1  A: 

Usually there is more than one way to execute a query involving joins. All modern RDBMSs search through different join plans looking for the best plan by estimating costs (CPU and disk access times) for each.

The problem is, each additional join in a query multiplies the number of possible plans by an increasing number, resulting in double-factorial (worse than exponential) growth in the number of plans to consider as the number of joins increases. For that reason, the DB has to limit the search somewhere, meaning suboptimal plans inevitably get chosen for queries involving many joins.

For reference, PostgreSQL stops evaluating all possible plans after 12 joins by default. SQL Server will have a similar limit for sure. 17 joins would take (2 * 13) * (2 * 14) * (2 * 15) * (2 * 16) * (2 * 17) times as long to evaluate -- it's more than enough to overwhelm any RDBMS that has ever existed, or ever will.

There is also the fact to consider that DBs estimate costs based on crude statistics, such as the number of distinct values in a column and/or a list of the 10 most common values in a column. This all adds up to the fact that, as the number of joins goes up, the likelihood of choosing the best (or even a reasonable) join strategy goes way down.

Why do you need to join 17 tables? Is there no way you can simplify your DB schema?

j_random_hacker