views:

62

answers:

3

So, I've been seeing a lot of SQL examples on this site. I have a question about the relative performance of inner joins (plain JOIN) and cross joins (SELECT foo FROM bar,baz WHERE). Turns out the question has already been asked:

http://stackoverflow.com/questions/1018822/inner-join-versus-where-clause-any-difference

But I still have an issue I'd like clarification on. I didn't see anything in the answers.

The questions is this:

Assume no fields are NULL. Given two equivalent queries, one formulated like this:

SELECT * FROM t1
JOIN t2 ON t1.t2_id=t2.t1_id AND t2.bar='baz'
WHERE t1.foo='bar'

And one formatted like this:

SELECT * FROM t1,t2
WHERE t1.foo='bar' AND t1.t2_id=t2.t1_id AND t2.bar='baz'

Is there a difference in their execution time? I'm interested specifically in the case where restrictions are placed on values located in both tables, in addition to the ID-matching to associate like rows. Note that there is no foreign key constraint in this schema.

I should probably also say that I'm interested in how this extends to more than two tables.

Thanks in advance for your answers, SQL experts!

+3  A: 

Your first example is normally called an explicit join and the second one an implicit join. Performance-wise, they should be equivalent, at least in the popular DBMSes.

Daniel Vassallo
So, the database engine knows enough to order however many joins you're performing in the way that produces the minimally-sized intermediate subset at each step?
Borealid
As far as I know, inner joins can be considered commutative, and you can list them in any order and you will get the same results. The query optimizer will internally determine the ideal order of the joins based on various heuristics.
Daniel Vassallo
Yes, you will get the same results, but the performance difference can be crazy. For instance, if you have a table which matches none of your join criteria, you will always end up with an empty set, but if you do a bunch of work *before* including that table...
Borealid
I might be wrong, but I think the query optimizer should deal with that in normal circumstances. Do you have any experience with the issue you are mentioning? (ie a different order of joins causes a performance difference)
Daniel Vassallo
Yes, in a database design class I took a while back, we sped up queries by 100x and sometimes more by reordering the joins. How would the query optimizer know what to do? It would have to evaluate the conditions to know what size they'd make the set... And the conditions can even be aggregation functions!
Borealid
@Borealid: Let's see what the other sql experts have to say. In the meantime, I found this answer on SO which is related to this topic: http://stackoverflow.com/questions/228424/in-what-order-are-mysql-joins-evaluated/228468#228468
Daniel Vassallo
Do note that we weren't using MySQL for the class - I don't know on what software our queries were running. It could have been a custom no-optimization engine, since the whole point was to talk about set theory -_-.
Borealid
@Borealid: That would explain a lot then :) Personally, I never had to optimize my queries by reordering the joins. I always left it up to the query optimizer... If there are really cases in popular DBMSes where the order of joins can effect performance, I'd expect that to happen in some remote edge case.
Daniel Vassallo
On Oracle 6 and previous the order in which you wrote the joins, particularly with outer joins but also inner, have an extremely strong impact on how the query was constructed and ran. Orders of magnitude were common. However this progressively went away with later versions and I'd be surprised if it made much difference in a modern system, if the statistics are up to date that is.
Cruachan
+1  A: 

Re-ordering of inner-join criteria is extremely easy for the optimizer to do, and there should be very little chance of it messing that up - but if statistics are out of date, all bets are off, it may re-order them to use a table with bad statistics first. But of course that may affect you even if you chose the order.

At least in SQL Server, the optimizer can often even push inner join criteria down through views and inline table-valued functions so that they can be highly selective as early as possible.

Cade Roux
+1  A: 

I think most 'SQL experts' would write the query more like this:

SELECT * 
  FROM t1
       INNER JOIN t2 
         ON t1.t2_id = t2.t1_id 
 WHERE t1.foo='bar'
       AND t2.bar = 'baz';

Specifically:

  • have strong preference for the INNER JOIN syntax (though may choose to omit the INNER keyword);
  • put only the 'join' predicates in the JOIN clause;
  • put the 'filter' predicates in the WHERE clause.

The difference between a 'join' search condition and a 'filter' join condition is subjective but there is much consensus among practitioners.

P.S. what you call a 'cross join' isn't :) As you say, the two queries are equivalent (both 'logical' inner joins, if you will) but the one that doesn't use the explicit [INNER] JOIN syntax uses what is known as infixed notation.

onedaywhen