tags:

views:

1646

answers:

3

Ok so I realize that this is a pretty vague question, but bear with me.

I have experienced this problem on numerous occasions with different and unrelated queries. The query below takes many minutes to execute:

SELECT <Fields>
FROM <Multiple Tables Joined>
    LEFT JOIN (SELECT <Fields> FROM <Multiple Tables Joined> ) ON <Condition>

However, by just adding the join hint it query the executes in just seconds:

SELECT <Fields>
FROM <Multiple Tables Joined>
    LEFT HASH JOIN (SELECT <Fields> FROM <Multiple Tables Joined> ) ON <Condition>

The strange thing is the type of JOIN specified in the hint is not really what improves the performance. It appears to be because the hint causes the optimizer to execute the sub query in isolation and then join. I see the same performance improvement if I create a table-valued function (not an inline one) for the sub-query. e.g.

SELECT <Fields>
FROM <Multiple Tables Joined>
    LEFT JOIN dbo.MySubQueryFunction() ON <Condition>

Anybody have any ideas why the optimizer is so dumb in this case?

+9  A: 

If any of those tables are table variables, the optimizer uses a bad estimate of 0 rows and usually chooses nested loop as the join technique.

It does this due to a lack of statistics on the tables involved.

David B
I'm not using table variables but there are often views in the sub query. Your reasoning does make sense to me though.
Darrel Miller
When I remove the join hint, the query plan changes substantially and it does introduce nested loops. I cannot find where it is making the bad estimate of rows but I can't spend any more time looking.
Darrel Miller
+6  A: 

Optimizer is an algorithm. It is not dumb or smart, it works the way it is programmed.

Hash join implies building a hash table on a smaller row source, that's why the inner query must be executed first.

In first case optimizer might have chosen a nested loop. It pushed the join condition into the inner query and executed the inner query on each iteration with an additional predicate. It might not find an appropriate index for this predicate, and a full table scan did take place on each iteration.

It's hard to say why this happens unless you post your exact query and how many rows are in your tables.

With a table function it's impossible to push a join condition into the inner query, that's why it's being executed only once.

Quassnoi
I agree that is what seems to be happening. I just don't know why the optimizer chooses to do a nested loop.
Darrel Miller
It's hard to say, we need to see exact query and how many rows are in the tables.
Quassnoi
I've tried reducing the query down, but the smallest I can get it whilst still reproducing the problem is 43 lines. I don't want to put any through the pain of trying to analyze that without the database.
Darrel Miller
A: 

Inside SQL Server 2005: T-SQL Querying answers these and many other questions. One of the finest looks under the hood of T-SQL data retrieval and verb processing I've ever seen. (No, I am not an author of the book, nor am I affiliated with any author or authors of the book, or Microsoft, or Microsoft Press. This is simply an incredible work, and various DBAs I've turned onto this the past couple years agree.)

andrewbadera
I know Itzik - he is probably one of the smartest SQL people on the planet.
keithwarren7