views:

212

answers:

2

Hi all,

Recently, I was trying to optimise this query

UPDATE Analytics
SET UserID = x.UserID
FROM Analytics z 
INNER JOIN UserDetail x ON x.UserGUID = z.UserGUID

Estimated execution plan show 57% on the Table Update and 40% on a Hash Match (Aggregate). I did some snooping around and came across the topic of JOIN hints. So I added a LOOP hint to my inner join and WA-ZHAM! The new execution plan shows 38% on the Table Update and 58% on an Index Seek.

So I was about to start applying LOOP hints to all my queries until prudence got the better of me. After some googling, I realised that JOIN hints are not very well covered in BOL. Therefore...

  1. Can someone please tell me why applying LOOP hints to all my queries is a bad idea. I read somewhere that a LOOP JOIN is default JOIN method for query optimiser but couldn't verify the validity of the statement?
  2. When are JOIN hints used? When the sh*t hits the fan and ghost busters ain't in town?
  3. What's the difference between LOOP, HASH and MERGE hints? BOL states that MERGE seems to be the slowest but what is the application of each hint?

Thanks for your time and help people!

I'm running SQL Server 2008 BTW. The statistics mentioned above are ESTIMATED execution plans.

+3  A: 

Can someone please tell me why applying LOOP hints to all my queries is a bad idea. I read somewhere that a LOOP JOIN is default JOIN method for query optimiser but couldn't verify the validity of the statement?

Because this robs the optimizer of the opportunity to consider other methods which can be more efficient.

When are JOIN hints used? When the sh*t hits the fan and ghost busters ain't in town?

When the data distribution (on which the optimizer makes its decisions) is severely skewed and the statistics are no able to represent it correctly.

What's the difference between LOOP, HASH and MERGE hints? BOL states that MERGE seems to be the slowest but what is the application of each hint?

These are different algorithms.

  1. LOOP is nested loops: for each record from the outer table, the inner table is searched for matches (using the index of available). Fastest when only a tiny portion of records from both tables satisfy the JOIN and the WHERE conditions.

  2. MERGE sorts both tables are traverses them in the sort order, skipping the unmatched records. Fastest for the FULL JOINs and when both recordsets are already sorted (from previous sort operations or when the index access path is used)

  3. HASH build a hash table in the temporary storage (memory or tempdb) from one of the tables and searches it for each record from the other one. Fastest if the large portion of records from either table matches the WHERE and JOIN condition.

Quassnoi
Great explanation! I don't suppose you could give your 2 cents on what Martin's reply?
Nai
+2  A: 

The Estimated execution plan show 57% on the Table Update and 40% on a Hash Match (Aggregate). I did some snooping around and came across the topic of JOIN hints. So I added a LOOP hint to my inner join and WA-ZHAM! The new execution plan shows 38% on the Table Update and 58% on an Index Seek.

Surely that means that your proposed plan is worse? Assuming the table update takes a constant time it is now being out costed by the index activity.

Martin Smith
Is that a fair assumption? I have always been under the impression that having the bulk of the work on Index Seek is the way to go.
Nai
I think its a fair assumption though I'm happy to stand corrected if I'm wrong. Is there a clustered index on Analytics.UserGUID? If so updating the GUID to a different value will likely cause a fair bit of IO which may explain any performance issue you are getting.
Martin Smith
I have a non-clustered index on Analytics.UserGUID. I'm updating the UserID columns not the UserGUID. I have no indexes on the Analytics.UserID column.
Nai
Sorry didn't read that bit properly!
Martin Smith