views:

1058

answers:

10

Suppose I have two tables that are linked (one has a foreign key to the other) :

CREATE TABLE Document (
  Id INT PRIMARY KEY,
  Name VARCHAR 255
)

CREATE TABLE DocumentStats (
  Id INT PRIMARY KEY,
  DocumentId INT, -- this is a foreign key to table Document
  NbViews INT
)

I know, this is not the smartest way of doing things, but this is the best example I could come up with.

Now, I want to get all documents that have more than 500 views. The two solutions that come to my mind are :

SELECT *
FROM Document, DocumentStats
WHERE DocumentStats.Id = Document.Id
  AND DocumentStats.NbViews > 500

or :

SELECT *
FROM Document
INNER JOIN DocumentStats
ON Document.Id = DocumentStats.Id
WHERE DocumentStats.NbViews > 500

Are both queries equivalent, or is there one way that is far better than the other ? If so, why ?

I'm aware that my example is not perfect, and that the queries may need some tune-up, but I hope you got the point ;) !

EDIT: as requested in the answers, this question was aimed at MSSQL, but I would be interested in knowing if it is different for other DB engines (MySQL, etc...)

+3  A: 

In MySQL at least, they will both be optimized to the same query.

Greg
+12  A: 

Theoretically, no, it shouldn't be any faster. The query optimizer should be able to generate an identical execution plan. However, some DB engines can produce better execution plans for one of them (not likely to happen for such a simple query but for complex enough ones). You should test both and see (on your DB engine).

Mehrdad Afshari
+1 for recommending testing OPs specific case. Profiling trumps.
Daren Thomas
I agree. I often try out both the JOIN and WHERE versions of a query to see which performs the best. Sometimes JOIN is faster, sometimes WHERE - it depends on the query.
Jon Cram
+1  A: 

In MSSQL, both queries are compiled to the same execution plan, so there's no difference. It's more about readability - I think the JOIN is easier to read, so I use that.

rwmnau
+7  A: 

There is no way to correctly answer this without limiting to a target database.

For MS-SQL both queries result in the same execution plans, but keep in mind:

SELECT *
FROM Document, DocumentStats
WHERE DocumentStats.Id = Document.Id
  AND DocumentStats.NbViews > 500

Is really risky since it is easy to forget the join condition in the WHERE clause and end up with a nasty cross join.

Sam Saffron
Thanks for the tip, I thend to use mainly WHERE but I should move to JOINs...
Wookai
+2  A: 

It is a "standard" to use the INNER JOIN syntax, although practically equivalent. The main reason it should be used is for clarity and mobility purposes as it is consistent with OUTER JOIN syntax.

Sev
A: 

When you use Sqlite: The where-syntax is slightly faster because Sqlite first translates the join-syntax into the where-syntax before executing the query.

Theo
Really interesting. Could you provide a reference for this behaviour, please? This would mean that putting a filtering condition at the ON clause rather than at the WHERE, which can give different results on SQL Server, would be exactly the same here!
Joe Pineda
+1  A: 

I guess that it doesn't make a difference too. To be sure you can check if the explain plan of those two queries is identical. In order to look at the explain plan in MySQL you have to put the "explain" keyword before the statement, eg:

EXPLAIN
SELECT *
FROM Document, DocumentStats
WHERE DocumentStats.Id = Document.Id
  AND DocumentStats.NbViews > 500

I'm sure there exists an equivalent in MSSQL too.

By the way: This looks like this is a 1:1 relationship so I'd just include the nbviews attribute directly in the Document table, therefore you can save a join.

samson
+1  A: 

Explicit joins are easier to maintain as the intent of the query is much clearer. Also they are not subject to accidental cross joins so if you have a cross join in the query, the maintainer knows it was intended to be there.

If you ever need to use outer joins, you should know that the *= syntax is deprecated in SQL Server and will soon be removed. Also it currently doesn't function as expected all the time and may not give correct results and thus should NEVER be used. Mixing explicit outer joins and where clause joins (implicit joins) makes a query much harder for a maintainer to read and understand.

HLGEM
A: 

If you're talking specifically about SQL Server, then you should definitely be using the INNER JOIN syntax. Apart from being (personal opinion alert!) easier to read and more clear in intent, there is, as of SQL Server 2005, no equivalent syntax for outer joins. The *= and =* syntax is not supported by default in 2005 -- you need to enable a compatibility mode to support it. It will eventually be removed, possibly as soon as the next release (or possibly not!)

This means:

  • If you need to change a query from inner join to outer join, you need to either rewrite it (argh) or enable compat mode (yuk)
  • Without compat mode, you can't be consistent with how you implement different types of joins (inner vs outer), making for a maintenance nightmare (and, where the two are combined in the one query, some behaviour that's non-intuitive).

Note also that contrary to popular belief, the two are not equivalent. Some things are much more awkward, and some are simply not possible. Kalen Delaney's Inside SQL Server 2000 covers some examples; not sure if the newer editions do, because that join syntax is deprecated anyway.

Cowan
+1  A: 

Performance of "JOIN" versus "WHERE"... everything hinges on how well the database engine is able to optimize the query for you. It will take into account any indexes you might have on the columns being returned and consider that performance of WHERE and JOIN clauses also come down to the physical database file itself and its fragmentation level and even the storage technology you use to store the database files on.

MSSql server executes queries in the following order (this should give you an idea of the functions of the WHERE and JOIN clauses)

Microsoft Sql Server query process order

the following is taken from the excellent series of books about Microsoft SQL Server, Inside Microsoft SQL Server 2005: T-SQL Querying which can be found here

(Step 8) SELECT (Step 9) DISTINCT (Step 11)
(Step 1) FROM *left_table*
(Step 3) *join_type* JOIN *right_table*
(Step 2) ON *join_condition*
(Step 4) WHERE *where_condition*
(Step 5) GROUP BY *group_by_list*
(Step 6) WITH [CUBE|ROLLUP]
(Step 7) HAVING *having_clause*
(Step 10) ORDER BY *order_by_list*

Mike J
Ah, loved this answer! Found thru trial and error that SQL S. executes first the ON, then WHERE, finally HAVING. Normally doesn't matter, except on certain situations where NULLs are involved - having a filtering condition at the ON or at the WHERE can make a *big* difference in such cases. Now I'd like to ask, is this behaviour ANSI standard mandated or SQL Server-specific? If you're not sure, I might open a question on this!
Joe Pineda
Clarification: by "this behaviour" I mean the behaviour of evaluating first ON, then WHERE. Theo stated in his answer that SQLite first converts JOIN...ON to a WHERE clause. If this is true (don't know, can't verify now) then evaluating ON first, WHERE second might be very specific to SQL Server (possibly also Sybase).
Joe Pineda
This query execution order is how Microsoft SQL server will execute queries.
Mike J