tags:

views:

288

answers:

4

Hi,

I'm just wondering if all of the following joins are logically equivalent, and if not, why not?

SELECT t1.x, t2.y from t1, t2 where t1.a=t2.a and t1.b=t2.b and t1.c = t2.c;

SELECT t1.x, t2.y from t1 join t2 on t1.a=t2.a where t1.b=t2.b and t1.c = t2.c;

SELECT t1.x, t2.y from t1 join t2 on t1.a=t2.a and t1.b=t2.b where t1.c = t2.c;

SELECT t1.x, t2.y from t1 join t2 on t1.a=t2.a and t1.b=t2.b and t1.c = t2.c;

I guess my real question is: does combining "where" with "on" doing something different from just having multiple conditions ANDed together with "on"?

I work with MySQL, in case that makes a difference.

Thanks,

Ben

+5  A: 

They are logically equivalent and should produce the same result. However, the last one is to be preferred as it states more correctly the semantics of the query - i.e. "join tables t1 and t2".

The WHERE clause should be used for "filtering" results of the join - e.g.

... WHERE t2.some_col > 10

Also, as Constantin has said in another answer, the 4 queries would be different if the join was an OUTER join.

Tony Andrews
Would the first query fully execute the cross join, or are DBMS smart enough to apply the where clause first?
Gavin Miller
SQL Server editions 2000 onwards will infer the top example as INNER JOINs. Test it and view the execution plan
Russ Cam
The answer to that depends on the DBMS. I know that in Oracle the optimiser is smart enough to choose the same best execution plan in all 4 cases, but I don't know about other DBMSs like mySQL.
Tony Andrews
Most RDBMSes (MySQL included) will be able to optimize all four forms identically. However, the fourth form is still preferred both because it's semantically more clear (it makes it more obvious that the conditions are part of the join) and because it provides "insurance" against naive RMBMSes.
Ben Blank
This my main problem with ANSI. IF, IF, IF someone writes like you've said is best, then yes, ANSI can make things clearer. But it doesn't enforce it. Any WHERE clause with tables on both sides of the operator should cause an error. If ANSI did that, I might be a fan.
+2  A: 
Constantin
+2  A: 

Yes, as others have stated, the result is the same from all these queries.

FWIW, you can also use this shorthand syntax when you're doing an equi-join on column names that are the same in both tables:

SELECT t1.x, t2.y from t1 join t2 using (a, b, c);

As far as optimization, it should be optimized the same. That is, the RDBMS should be smart enough to analyze the WHERE syntax the same, and perform joins instead of generating an intermediate huge cross-join result and applying filtering conditions to it. This is such a common type of query, that it's also common for a given RDBMS implementation to recognize and optimize it.

In the case of MySQL, join and where are (kind of) evaluated together. Try using EXPLAIN to analyze your query. If the "type" column indicates "eq_ref" it means it's using an indexed join. This is the best type of join with respect to optimization. If "type" is "ref" it's good too.

You can get these join optimization types whether you put the condition in the JOIN...ON clause or the WHERE clause.

Bill Karwin
Is using 'using' standard SQL or is that specific to some vendor's SQL implementation?
IronGoofy
Yes, the USING syntax is standard SQL. I haven't encountered any brand of database that supports JOIN...ON, but does not support JOIN...USING. Note the parentheses are mandatory after USING, but optional after ON.
Bill Karwin
+1  A: 

They are logically equivalent. However, where you define the join conditions makes a difference as to how many records are used in the temporary table on which the where clause is applied. That is,

If table t1, t2 and t3 had 10 records each, the statement,

SELECT t1.x, t2.y from t1, t2 where t1.a=t2.a and t1.b=t2.b and t1.c = t2.c;

results in 1000 records of a permutation of the three tables records and then the where clause is applied.

For

SELECT t1.x, t2.y from t1 join t2 on t1.a=t2.a and t1.b=t2.b and t1.c = t2.c;

only ten records are in the temporary table before any where clause (none in this case) is applied. The second method is much faster when working with large tables.

achinda99