views:

741

answers:

8

Is there any difference (performance, best-practice, etc...) between putting a condition in the JOIN clause vs. the WHERE clause?

For example...

-- Condition in JOIN
SELECT *
FROM dbo.Customers AS CUS
INNER JOIN dbo.Orders AS ORD 
ON CUS.CustomerID = ORD.CustomerID
AND CUS.FirstName = 'John'

-- Condition in WHERE
SELECT *
FROM dbo.Customers AS CUS
INNER JOIN dbo.Orders AS ORD 
ON CUS.CustomerID = ORD.CustomerID
WHERE CUS.FirstName = 'John'

Which do you prefer (and perhaps why)?

+2  A: 

WHERE will filter after the JOIN has occurred.

Filter on the JOIN to prevent rows from being added during the JOIN process.

TheTXI
Semantically, they are prevented during the INNER JOIN process, but the optimizer can rearrange INNER JOIN and WHERE predicates at will, so the optimizer is free to exclude them later if it wishes.
Cade Roux
Cade Roux: Right. Often times what you write in SQL isn't what the optimizer will give you when all is said and done. I would suppose then that this would be right in an all-theory world, while your answer is of course more correct in the world of automatic query optimizers :)
TheTXI
+1  A: 

I prefer the JOIN to join full tables/Views and then use the WHERE To introduce the predicate of the resulting set.

It feels syntactically cleaner.

John Nolan
A: 

Putting the condition in the join seems "semantically wrong" to me, as that's not what JOINs are "for". But that's very qualitative.

Additional problem: if you decide to switch from an inner join to, say, a right join, having the condition be inside the JOIN could lead to unexpected results.

Jacob B
+1  A: 

Most RDBMS products will optimize both queries identically. In "SQL Performance Tuning" by Peter Gulutzan and Trudy Pelzer, they tested multiple brands of RDBMS and found no performance difference.

I prefer to keep join conditions separate from query restriction conditions.

If you're using OUTER JOIN sometimes it's necessary to put conditions in the join clause.

Bill Karwin
I agree with you that syntactically it's cleaner, and I have to defer to your knowledge of that book and your very high reputation, but I can think of 4 queries in the last week with very different execution plans, CPU times, and logical reads when I moved where predicates to the join.
marr75
You were asking about best practices. As soon as you get into testing how a specific RDBMS implementation works, other folks have given the correct advice: benchmark.
Bill Karwin
+1  A: 

I typically see performance increases when filtering on the join. Especially if you can join on indexed columns for both tables. You should be able to cut down on logical reads with most queries doing this too, which is, in a high volume environment, a much better performance indicator than execution time.

I'm always mildly amused when someone shows their SQL benchmarking and they've executed both versions of a sproc 50,000 times at midnight on the dev server and compare the average times.

marr75
+6  A: 

The relational algebra allows interchangeability of the predicates in the WHERE clause and the INNER JOIN, so even INNER JOIN queries with WHERE clauses can have the predicates rearrranged by the optimizer so that they may already be excluded during the JOIN process.

I recommend you write the queries in the most readble way possible.

Sometimes this includes making the INNER JOIN relatively "incomplete" and putting some of the criteria in the WHERE simply to make the lists of filtering criteria more easily maintainable.

For example, instead of:

SELECT *
FROM Customers c
INNER JOIN CustomerAccounts ca
    ON ca.CustomerID = c.CustomerID
    AND c.State = 'NY'
INNER JOIN Accounts a
    ON ca.AccountID = a.AccountID
    AND a.Status = 1

Write:

SELECT *
FROM Customers c
INNER JOIN CustomerAccounts ca
    ON ca.CustomerID = c.CustomerID
INNER JOIN Accounts a
    ON ca.AccountID = a.AccountID
WHERE c.State = 'NY'
    AND a.Status = 1

But it depends, of course.

Cade Roux
Upped for adding it depends. Benchmark everything.
marr75
A: 

Joins are quicker in my opinion when you have a larger table. It really isn't that much of a difference though especially if you are dealing with a rather smaller table. When I first learned about joins, i was told that conditions in joins are just like where clause conditions and that i could use them interchangeably if the where clause was specific about which table to do the condition on.

Eric
+3  A: 

For inner joins I have not really noticed a difference (but as with all performance tuning, you need to check against your database under your conditions).

However where you put the condition makes a huge difference if you are using left or right joins. For instance consider these two queries:

SELECT *
FROM dbo.Customers AS CUS 
LEFT JOIN dbo.Orders AS ORD 
ON CUS.CustomerID = ORD.CustomerID
WHERE ORD.OrderDate >'20090515'

SELECT *
FROM dbo.Customers AS CUS 
LEFT JOIN dbo.Orders AS ORD 
ON CUS.CustomerID = ORD.CustomerID
AND ORD.OrderDate >'20090515'

The first will give you only those records that have an order dated later than May 15, 2009 thus converting the left join to an inner join. The second will give those records plus any customers with no orders. The results set is very differnt depending on where you put the condition. (Select * if for example purposes only, you should not use of course in production code.) The exception to this is when you want to see only the records in one table but not the other. Then you use the where clause for the condition not the join.

SELECT *
FROM dbo.Customers AS CUS 
LEFT JOIN dbo.Orders AS ORD 
ON CUS.CustomerID = ORD.CustomerID
WHERE ORD.OrderID is null
HLGEM