views:

271

answers:

5

A coworker asked me to look at indexing on some tables because his query was running very long. Over an hour.

select count(1)
from databaseA.dbo.table1
inner join databaseA.dbo.table2 on (table1.key = table2.key)
inner join databaseB.dbo.table3 on (table1.key = table3.key)

Note the different databases. This was being run from DatabaseB

Tables 1 and 2 were over 2 million records long. Table3 had a dozen records or so.

I looked at the query plan and the optimizer decided to do nested-loop index seeks into tables 1 and 2 with Table3 as the driving table!

My first assumption was that statistics were seriously messed up on Tables1 & 2 but before updating statistics I tried adding a join hint thusly:

select count(1)
from databaseA.dbo.table1
inner HASH join databaseA.dbo.table2 on (table1.key = table2.key)
inner join databaseB.dbo.table3 on (table1.key = table3.key)

Results returned in 15 seconds.

Since I was short on time, I passed the results back to him but I'm worried that this might result in problems down the road.

Should I revisit the statistics issue and resolve the problem that way? Could the bad query plan have resulted from the join being from a separate databases?

Can anyone offer me some ideas based on your experience?

+2  A: 

I would suspect the statistics first.

As you are no doubt aware, Join hints should be avoided in 99% of cases and used only when you have proof that they are absolutely required.

Mitch Wheat
+1  A: 

Check statistics, and indexing on the table first. Index hints can cause problems. If the data in the tables changes the optimizer will be unable to choose a more efficent plan since you have forced it to always use a hash.

JoshBerke
+1  A: 

Wouldn't a nested loop be the most appropiate? Take the 12 records from Table 3, ,match to the 12 records in Table 1, match to 12 records in Table 2.

Otherwise, your hash join would enforce ordering as well - meaning you'd hash 1 million records from Table 1 and Table 2, then join to the 12 records in Table 3.

I'd look at statistics for both the plans - and I'd suspect the loop join is actually more efficient, but was blocked or your hash join was taking advantage of cached data.

But - yeah - in general, join hints are a last resort.

Mark Brackett
A: 

Thanks for the feedback!

I'm no expert at Query Tuning and you might have a good argument there Mark. The plan was looping over 2 million records with an index seek to retrieve those 12 records though. Bad statistics might be the reason it was taking so long and even with an update it will still choose the same query plan but execute faster. There may be some disk IO issues that periodically crop up affecting the looping as well.

To the broader question though, has anyone actually experienced a problem using a join hint? Especially when we're getting acceptable results on what are essentially large, slowly changing tables.

TrickyNixon
Yes, a developer I knew put an index hint on a table, which then changed dramatically over time and the query started performing poorly.
JoshBerke
That's exactly the reason I wouldn't use one except in exceptional vary rare circumstances.
Mitch Wheat
Very rare circumstances, and with procedures in place to monitor the system over time to ensure it is still behaving properly
JoshBerke
With the loop join, Sql would loop over the 12 records in Table 3 and do an index seek into Table 1. That's not looping over millions of records - but only 12, and is undoubtedly the most efficient way of running that join. I suspect you misread the query plan.
Mark Brackett
A: 

A slow-running query involving linked servers might have to do with collation. See here for some background: http://blogs.msdn.com/psssql/archive/2008/02/14/how-it-works-linked-servers-and-collation-compatibility.aspx The hash join hint forces the sortorder, so that explains the performance gain.

Here's how to set the options:

EXEC master.dbo.sp_serveroption 
    @server=N'databaseA', 
    @optname=N'collation compatible', 
    @optvalue=N'true'

EXEC master.dbo.sp_serveroption 
    @server=N'databaseA', 
    @optname=N'use remote collation', 
    @optvalue=N'false'

-Edoode

edosoft