tags:

views:

192

answers:

6

Given these two queries:

Select t1.id, t2.companyName 
from table1 t1 
  INNER JOIN table2 t2 on t2.id = t1.fkId 
WHERE t2.aField <> 'C'

OR:

Select t1.id, t2.companyName 
from table1 t1 
  INNER JOIN table2 t2 on t2.id = t1.fkId  and t2.aField <> 'C'

Is there a demonstrable difference between the two? Seems to me that the clause "t2.aField <> 'C'" will run on every row in t2 that meets the join criteria regardless. Am I incorrect?

Update: I did an "Include Actual Execution Plan" in SQL Server. The two queries were identical.

+4  A: 

I prefer to use the Join criteria for explaining how the tables are joined together. So I would place the additional clause in the where section.

I hope (although I have no stats), that SQL Server would be clever enough to find the optimal query plan regardless of the syntax you use.

HOWEVER, if you have indexes which also have id, and aField in them, I would suggest placing them together in the inner join criteria.

It would be interesting to see the query plan's in these 2 (or 3) scenarios, and see what happens. Nice question.

Bravax
I agree, if it's part of an index it should be in the ON clause and if it's filtering criteria it should go in the WHERE clause. This has got enough upvotes to call it the answer.
jcollum
+3  A: 

There is a difference. You should do an EXPLAIN PLAN for both of the selects and see it in detail.

As for a simplier explanation: The WHERE clause gets executed only after the joining of the two tables, so it executes for each row returned from the join and not nececerally every one from table2.

Performance wise its best to eliminate unwanted results early on so there should be less rows for joins, where clauses or other operations to deal with later on.

In the second example, there are 2 columns that have to be same for the rows to be joined together so it usually will give different results than the first one.

Azder
A database product as stupid as that would be braindead. Of course any database implementation worth it's name does not crate unnecessary joins
John Nilsson
Yeah, I've tried this before in MySQL. At least in my case, there was no difference in speed
v3
For a simple natural join of two tables there wont be any difference. But, it gets noticeable when you have to join 5-6 tables and then even "smart" database implementation will have to be helped.
Azder
A: 

I'd prefer first query. SQL server will use the best join type for your query based on indexes you have, after that will apply WHERE clause. But you can run both queries at the same time, look at execution plans, compare and choose the fastest (optimize adding indexes also).

Irina C
keep an eye on the performance of your writes whenever adding indexes. There are methods available for your indexes that make them "write-friendly".
yetanotherdave
+2  A: 

It depends.

SELECT
  t1.foo,
  t2.bar
FROM
  table1 t1
  LEFT JOIN table2 t2 ON t1.SomeId = t2.SomeId
WHERE
  t2.SomeValue IS NULL

is different from

SELECT
  t1.foo,
  t2.bar
FROM
  table1 t1
  LEFT JOIN table2 t2 ON t1.SomeId = t2.SomeId AND t2.SomeValue IS NULL

It is different because the former crosses out all records from t2 that have NULL in t2.SomeValue and those from t1 that are not referenced in t2. The latter crosses out only the t2 records that have NULL in t2.SomeValue.

Just use the ON clause for the join condition and the WHERE clause for the filter.

Tomalak
I used an inner join in my sample though.
jcollum
your sample is different because t2.SomeValue could be null because it didn't join or it did join and the field is null
jcollum
The difference is in the NULL. In the left join, t2.SomeValue will be NULL when it *actually is* NULL, *and* it will be NULL when a record from t1 is unreferenced. The result sets are not equal. The only way to differentiate the two is to use the ON clause.
Tomalak
i see your point
jcollum
Excellent point, Tomalak. I only use ON criteria like this when I need to "pre-filter" the records from one of the tables I'm joining.
John M Gant
A: 

unless you are working on a single-user app or something similarly small that creates trivial load, the only considerations that mean anything is how the server will process your query.

The answers that mention query plans give good advice.

In addition, set io statistics on to get an idea of how many reads your query will generate (I especially love Azder's post).

Think of every DB server as a pump of data from disk to client. That pump goes faster if it performs only the IO needed to get the job done. If the data is in cache it will be even faster. But you don't want to be reading more than you need from disk - that will result in crowding out of your cache useful data for no good reason.

yetanotherdave
+1  A: 

Unless moving the join condition to the where clause changes the meaning of the query (like in the left join example above), then it doesn't matter where you put them. SQL will re-arrange them, and as long as they are provably equivalent, you'll get the same query.

That being said, I think it's more of a logical / readability thing. I usually put anything that relates two tables in the join, and anything that filters in the where.

John Gibb
There is a minor caveat to this answer. Theoretically, it makes absolutely no difference where you put your conditions. However, in practice, the optimizer is given a limited amount of time to rearrange the query, so may not have had time to re-order your join conditions. Therefore, in some cases with very complicated queries, it might matter.
John Gibb