views:

3553

answers:

10

For simplicity, assume all relevant fields are NOT NULL.

You can do:

SELECT
    table1.this, table2.that, table2.somethingelse
FROM
    table1, table2
WHERE
    table1.foreignkey = table2.primarykey
    AND (some other conditions)

Or else:

SELECT
    table1.this, table2.that, table2.somethingelse
FROM
    table1 INNER JOIN table2
    ON table1.foreignkey = table2.primarykey
WHERE
    (some other conditions)

Are those two worked on the same way by MySQL?

+37  A: 

INNER JOIN is ANSI syntax which you should use.

It is generally considered more readable, especially when you join lots of tables.

It can also be easily replaced with an OUTER JOIN whenever a need arises.

The WHERE syntax is more relational model oriented.

A result of two tables JOIN'ed is a cartesian product of the tables to which a filter is applied which selects only those rows with joining columns matching.

It's easier to see this with the WHERE syntax.

As for your example, in MySQL (and in SQL generally) these two queries are synonyms.

Also note that MySQL also has a STRAIGHT_JOIN clause.

Using this clause, you can control the JOIN order: which table is scanned in the outer loop and which one is in the inner loop.

You cannot control this in MySQL using WHERE syntax.

Quassnoi
+1 for explaining both sides and not giving a "my way is better!" answer.
Beska
Thanks, Quassnoi. You've got alot of details in your ans; is it fair to say that "yes, those queries are equivalent, but you should use inner join because it's more readable, and easier to modify"?
allyourcode
@allyourcode: for `Oracle`, `SQL Server`, `MySQL` and `PostgreSQL` — yes. For other systems, probably, too, but you better check.
Quassnoi
FWIW, using commas with join conditions in the `WHERE` clause is also in the ANSI standard.
Bill Karwin
`@Bill Karwin`: `JOIN` keyword was not a part of proprietary standards until the past more recent that it may seem. It made its way into `Oracle` only in version `9` and into `PostgreSQL` in version `7.2` (both released in `2001`). Appearance of this keyword was a part of `ANSI` standard adoption, and that's why this keyword is usually associated with `ANSI`, despite the fact the latter supports comma as a synonym for `CROSS JOIN` as well.
Quassnoi
Nevertheless, ANSI SQL-89 specified joins to be done with commas and conditions in a `WHERE` clause (without conditions, a join is equivalent to a cross join, as you said). ANSI SQL-92 added the `JOIN` keyword and related syntax, but comma-style syntax is still supported for backward compatiblity.
Bill Karwin
InterBase 4.0 is an example of a commercial RDBMS implementation that supported `JOIN` syntax as early as 1994.
Bill Karwin
+5  A: 

Implicit joins (which is what your first query is known as) become much much more confusing, hard to read, and hard to maintain once you need to start adding more tables to your query. Imagine doing that same query and type of join on four or five different tables ... it's a nightmare.

Using an explicit join (your second example) is much more readable and easy to maintain.

matt b
I couldn't disagree more. JOIN syntax is extremely wordy and difficult to organize. I have plenty of queries joining 5, 10, even 15 tables using WHERE clause joins and they are perfectly readable. Rewriting such a query using a JOIN syntax results in a garbled mess.Which just goes to show there is no right answer to this question and that it depends more on what you're comfortable with.
Noah Yetter
Noah, I think you might be in the minority here.
matt b
I get +1 to matt and Noah. I like diversity :). I can see where Noah is coming from; inner join doesn't add anything new to the language, and is definitely more verbose. On the other hand, it can make your 'where' condition much shorter, which usually means it's easier to read.
allyourcode
I personally would go for readability over succinctness
matt b
Is there any performance improvement? Will it be different performance vise?
Guru
I would assume that any sane DBMS would translate the two queries into the same execution plan; however in reality each DBMS is different and the only way to know for sure is to actually examine the execution plan (i.e., you'll have to test it yourself).
matt b
+1  A: 

They have a different human-readable meaning.

However, depending on the query optimizer, they may have the same meaning to the machine.

You should always code to be readable.

That is to say, if this is a built-in relationship, use the explicit join. if you are matching on weakly related data, use the where clause.

John Gietzen
+5  A: 

The implicit join ANSI syntax is older, less obvious and not recommended.

In addition, the relational algebra allows interchangeability of the predicates in the WHERE clause and the INNER JOIN, so even INNER JOIN queries with WHERE clauses can have the predicates rearrranged by the optimizer.

I recommend you write the queries in the most readble way possible.

Sometimes this includes making the INNER JOIN relatively "incomplete" and putting some of the criteria in the WHERE simply to make the lists of filtering criteria more easily maintainable.

For example, instead of:

SELECT *
FROM Customers c
INNER JOIN CustomerAccounts ca
    ON ca.CustomerID = c.CustomerID
    AND c.State = 'NY'
INNER JOIN Accounts a
    ON ca.AccountID = a.AccountID
    AND a.Status = 1

Write:

SELECT *
FROM Customers c
INNER JOIN CustomerAccounts ca
    ON ca.CustomerID = c.CustomerID
INNER JOIN Accounts a
    ON ca.AccountID = a.AccountID
WHERE c.State = 'NY'
    AND a.Status = 1

But it depends, of course.

Cade Roux
Your first snippet definitely hurts my brain more. Does anyone actually do that? If I meet someone that does that, is it ok for me to beat him over the head?
allyourcode
I locate the criteria where it makes the most sense. If I'm joining to a temporally consistent snapshot lookup table (and I don't have a view or UDF which enforces the selection of a valid date), I will include the effective date in the join and not in the WHERE because it's less likely to accidentally get removed.
Cade Roux
+7  A: 

Others have pointed out that INNER JOIN helps human readability, and that's a top priority; I agree. Let me try to explain why the join syntax is more readable.

A basic SELECT query is this:

SELECT stuff
FROM tables
WHERE conditions

The SELECT clause tells us what we're getting back; the FROM clause tells us where we're getting it from, and the WHERE clause tells us which ones we're getting.

JOIN is a statement about the tables, how they are bound together (conceptually, actually, into a single table). Any query elements that control the tables - where we're getting stuff from - semantically belong to the FROM clause (and of course, that's where JOIN elements go). Putting joining-elements into the WHERE clause conflates the which and the where-from; that's why the JOIN syntax is preferred.

Carl Manaster
Thanks for clarifying why inner join is preferred Carl. I think your ans was implicit in the others, but explicit is usually better (yes, I'm a Python fan).
allyourcode
+1  A: 

I'll also point out that using the older syntax is more subject to error. If you use inner joins without an ON clause, you will get a syntax error. If you use the older syntax and forget one of the join conditions in the where clause, you will get a cross join. The developers often fix this by adding the distinct keyword (rather than fixing the join because they still don't realize the join itself is broken) which may appear to cure the problem, but will slow down the query considerably.

Additionally for maintenance if you have a cross join in the old syntax, how the the maintiner know if you meant to have one (there are situations where cross joins are needed) or if it was an accident that should be fixed?

Plus (personal rant here), the standard using the explicit joins is 17 years old. Would you write application code using syntax that has been outdated for 17 years? Why do you want to write database code that is?

HLGEM
@HLGEM: While I agree completely that explicit JOINs are better, there are cases when you just need to use the old syntax. A real world example: ANSI JOIN's got into Oracle only in version 9i which was released in 2001, and until only a year ago (16 years from the moment the standard was published) I had to support a bunch of 8i installation for which we had to release critical updates. I didn't want to maintain two sets of updates, so we developed and tested the updates against all databases including 8i, which meant we were unable to use ANSI JOINs.
Quassnoi
+1  A: 

ANSI join syntax is definitely more portable.

I'm going through an upgrade of Microsoft SQL Server, and I would also mention that the =* and *= syntax for outer joins in SQL Server is not supported (without compatability mode) for 2005 sql server and later.

Benzo
Even in SQL Server 2000, *= and =* could give wrong results and should never be used.
HLGEM
A: 

I know you're talking about MySQL, but anyway: In Oracle 9 explicit joins and implicit joins would generate different execution plans. AFAIK that has been solved in Oracle 10+: there's no such difference anymore.

João Marcus
A: 

The SQL:2003 standard changed some precedence rules so a JOIN statement takes precedence over a "comma" join. This can actually change the results of your query depending on how it is setup. This cause some problems for some people when MySQL 5.0.12 switched to adhering to the standard.

So in your example, your queries would work the same. But if you added a third table: SELECT ... FROM table1, table2 JOIN table3 ON ... WHERE ...

Prior to MySQL 5.0.12, table1 and table2 would be joined first, then table3. Now (5.0.12 and on), table2 and table3 are joined first, then table1. It doesn't always change the results, but it can and you may not even realize it.

I never use the "comma" syntax anymore, opting for your second example. It's a lot more readable anyway, the JOIN conditions are with the JOINs, not separated into a separate query section.

Brent Baisley
+3  A: 

Applying conditional statements in ON / WHERE

Here i have explianed about the logical query processing steps.


Reference : Inside Microsoft® SQL Server™ 2005 T-SQL Querying
Publisher: Microsoft Press
Pub Date: March 07, 2006
Print ISBN-10: 0-7356-2313-9
Print ISBN-13: 978-0-7356-2313-2
Pages: 640

Inside Microsoft® SQL Server™ 2005 T-SQL Querying

(8) SELECT (9) DISTINCT (11)
(1) FROM
(3) JOIN
(2) ON
(4) WHERE
(5) GROUP BY
(6) WITH {CUBE | ROLLUP}
(7) HAVING
(10) ORDER BY

The first noticeable aspect of SQL that is different than other programming languages is the order in which the code is processed. In most programming languages, the code is processed in the order in which it is written. In SQL, the first clause that is processed is the FROM clause, while the SELECT clause, which appears first, is processed almost last.

Each step generates a virtual table that is used as the input to the following step. These virtual tables are not available to the caller (client application or outer query). Only the table generated by the final step is returned to the caller. If a certain clause is not specified in a query, the corresponding step is simply skipped.

Brief Description of Logical Query Processing Phases Don't worry too much if the description of the steps doesn't seem to make much sense for now. These are provided as a reference. Sections that come after the scenario example will cover the steps in much more detail.

  1. FROM: A Cartesian product (cross join) is performed between the first two tables in the FROM clause, and as a result, virtual table VT1 is generated.

  2. ON: The ON filter is applied to VT1. Only rows for which the is TRUE are inserted to VT2.

  3. OUTER (join): If an OUTER JOIN is specified (as opposed to a CROSS JOIN or an INNER JOIN), rows from the preserved table or tables for which a match was not found are added to the rows from VT2 as outer rows, generating VT3. If more than two tables appear in the FROM clause, steps 1 through 3 are applied repeatedly between the result of the last join and the next table in the FROM clause until all tables are processed.

  4. WHERE: The WHERE filter is applied to VT3. Only rows for which the is TRUE are inserted to VT4.

  5. GROUP BY: The rows from VT4 are arranged in groups based on the column list specified in the GROUP BY clause. VT5 is generated.

  6. CUBE | ROLLUP: Supergroups (groups of groups) are added to the rows from VT5, generating VT6.

  7. HAVING: The HAVING filter is applied to VT6. Only groups for which the is TRUE are inserted to VT7.

  8. SELECT: The SELECT list is processed, generating VT8.

  9. DISTINCT: Duplicate rows are removed from VT8. VT9 is generated.

  10. ORDER BY: The rows from VT9 are sorted according to the column list specified in the ORDER BY clause. A cursor is generated (VC10).

  11. TOP: The specified number or percentage of rows is selected from the beginning of VC10. Table VT11 is generated and returned to the caller.



Therefore, (INNER JOIN) ON will filter the data (The data count of VT will be reduced here itself) before applying WHERE clause. The subsequent join conditions will be executed with filtered data which makes better performance. After that only WHERE condition will apply filter conditions.

(Applying conditional statements in ON / WHERE will not make much difference in few cases. This depends how many tables you have joined and number of rows available in each join tables)

rafidheen