In general what makes an SQL query optimiser decide between a nested loop and a hash join.
views:
108answers:
2Typically, it's going to be dependent on the size of the sets that are being joined.
I highly recommend reading "Inside Microsoft SQL Server 2008: T-SQL Querying" by Itzik Ben-Gan:
http://www.solidq.com/insidetsql/books/insidetsql2008/
(the 2005 edition is just as applicable on this topic as well)
He goes into your question, as well as many others when it comes to getting the most out of your queries.
NESTED LOOPS
are good if the condition inside the loop is sargable, that is index can be used to limit the number of records.
For a query like this:
SELECT *
FROM a
JOIN b
ON b.b1 = a.a1
WHERE a.a2 = @myvar
, with a
leading, each record from a
will be taken and all corresponding records in b
should be found.
If b.b1
is indexed and has high cardinality, then NESTED LOOP
will be a preferred way.
In SQL Server
, it is also the only way to execute non-equijoins (something other than =
condition in the ON
clause)
HASH JOIN
is the fastest method if all (or almost all) records should be parsed.
It takes all records from b
, builds a hash table over them, then takes all records from a
and uses the value of the join column as a key to look up the hash table.
NESTED LOOPS
takes this time:Na * (Nb / C) * R
,where
Na
andNb
are the numbers of records ina
andb
,C
is the index cardinality, andR
is constant time required for the row lookup (1
is all fields inSELECT
,WHERE
andORDER BY
clauses are covered by the index, about10
if they are not)HASH JOIN
takes this time:Na + (Nb * H)
, where
H
is sum of constants required to build and lookup the hash table (per record). They are programmed into the engine.
SQL Server
computes the cardinality using the table statistics, computes and compares the two values and chooses the best plan.