ansaurus

Question

Subqueries vs joins

Answer 1

+5 A:

Run the explain-plan on each version, it will tell you why.

scotta 2008-09-26 19:00:10

Answer 2

A:

Look at the query plan for each query.

Where in and Join can typically be implemented using the same execution plan, so typically there is zero speed-up from changing between them.

David B 2008-09-26 19:01:05

Haha, I <3 Sql scrubs that down-vote because they don't know how to read query plans.

David B 2008-09-26 19:02:35

I downvoted you for using the word "scrubs"

Paolo Bergantino 2008-09-27 16:09:22

Answer 3

+17 A:

You are running the subquery once for every row whereas the join happens on indexes.

Sklivvz 2008-09-26 19:01:12

I don't think this is true. The SQL engine should run the subquery only once and use the result as a list.

dacracot 2008-09-26 19:05:35

That depends - if the subquery is correlated somehow with the outer query (uses its data), it is executed with each row.

qbeuek 2008-09-26 19:12:38

It's probably true in this instance, but it's not true in general.

David B 2008-09-26 19:14:03

Answer 4

+1 A:

Optimizer didn't do a very good job. Usually they can be transformed without any difference and the optimizer can do this.

Cade Roux 2008-09-26 19:01:34

But not in MySQL

Greg 2008-09-26 19:27:22

Answer 5

+1 A:

With a subquery, you have to re-execute the 2nd SELECT for each result, and each execution typically returns 1 row.

With a join, the 2nd SELECT returns a lot more rows, but you only have to execute it once. The advantage is that now you can join on the results, and joining relations is what a database is supposed to be good at. For example, maybe the optimizer can spot how to take better advantage of an index now.

Joel Coehoorn 2008-09-26 19:02:04

Answer 6

+2 A:

It isn't so much the subquery as the IN clause, although joins are at the foundation of at least Oracle's SQL engine and run extremely quickly.

dacracot 2008-09-26 19:02:52

where in really isnt inherently bad.

Shawn Simon 2008-09-26 19:19:29

Answer 7

+2 A:

The subquery was probably executing a "full table scan". In other words, not using the index and returning way too many rows that the Where from the main query were needing to filter out.

Just a guess without details of course but that's the common situation.

igelkott 2008-09-26 19:02:58

Answer 8

+2 A:

Usually its the result of the optimizer not being able to figure out that the subquery can be executed as a join in which case it executes the subquery for each record in the table rather then join the table in the subquery against the table you are querying. Some of the more "enterprisey" database are better at this, but they still miss it sometimes.

Mark Roddy 2008-09-26 19:03:02

Answer 9

+40 A:

A "correlated subquery" (i.e., one in which the where condition depends on values obtained from the rows of the containing query) will execute once for each row. A non-correlated subquery (one in which the where condition is independent of the containing query) will execute once at the beginning. The SQL engine makes this distinction automatically.

But, yeah, explain-plan will give you the dirty details.

Jeffrey L Whitledge 2008-09-26 19:03:42

Answer 10

+1 A:

This question is somewhat general, so here's a general answer:

Basically, queries take longer when MySQL has tons of rows to sort through.

Do this:

Run an EXPLAIN on each of the queries (the JOIN'ed one, then the Subqueried one), and post the results here.

I think seeing the difference in MySQL's interpretation of those queries would be a learning experience for everyone.

Pete Karl II 2008-09-26 19:05:45

Answer 11

+4 A:

before the queries are run against the dataset they are put through a query optimizer, the optimizer attempts to organize the query in such a fashion that it can remove as many tuples (rows) from the result set as quickly as it can. Often when you use subqueries (especially bad ones) the tuples can't be pruned out of the result set until the outer query starts to run.

With out seeing the the query its hard to say what was so bad about the original, but my guess would be it was something that the optimizer just couldn't make much better. Running 'explain' will show you the optimizers method for retrieving the data.

pfranza 2008-09-26 19:06:24

Answer 12

+3 A:

The where subquery has to run 1 query for each returned row. The inner join just has to run 1 query.

Shawn Simon 2008-09-26 19:18:32

Answer 13

A:

I agree on all saying that the subquery has to be run for each row whereas the inner join will be run once. But still I have seen situation in real-life where the optimizer will convert a subquery to an inner join.. Why didn't it do it in this case?

Andrei Rinea 2008-09-27 11:38:05

Optimizer couldn't because the subquery was correlated.

David B 2008-09-27 16:13:56

Answer 14

+3 A:

Here's an example of how subqueries are evaluated in MySQL 6.0.

The new optimizer will convert this kind of subqueries into joins.

Giuseppe Maxia 2008-09-28 07:19:08

That's a great article on the MySQL 6.0 improved optimizer, thanks

Fire Crow 2009-03-23 15:44:16

Answer 15

A:

Sorry to be a MySQL basher and not being able to offer any constructive advice, but the query optimizer looks like total s*** if it can't handle simple joins/subqueries. Truly, Oracle had similar issues in 8i, but this was fixed almost 10 years ago.

Andrew from NZSG 2008-09-30 07:08:02

ansaurus

tags:

views:

answers:

Subqueries vs joins

related questions