views:

1675

answers:

4

I need advice regarding subselect performance in MySQL. For a reason that I can't change, I am not able to use JOIN to create quesry filter, I can only add another AND clause in WHERE.

What is the peformance of:

select tasks.*
from tasks
where 
  some criteria
  and task.project_id not in (select id from project where project.is_template = 1);

compared to:

select tasks.*
from tasks, project
where
  some criteria
  and task.project_id = project.id and project.is_template <> 1;

Note that there is relatively small number of projects whete is_template = 1, and there could be large number of projects where is_template <> 1.

Is there other way to achieve the same result without subselects if I can't change anything but and filter?

+5  A: 

I believe that the second is more efficient as it requires only one select, but to be sure, you should EXPLAIN each query and check the results.

EXPLAIN select tasks.*
from tasks
where 
  some criteria
  and task.project_id not in (select id from project where project.is_template = 1);

EXPLAIN select tasks.*
from tasks, project
where
  some criteria
  and task.project_id = project.id and project.is_template <> 1;
Rob Prouse
Thanks for the EXPLAIN tip.Seems like adding index on project.is_template helps a lot.
Dev er dev
+1  A: 

How much difference there is between the two could depend greatly on what "some criteria" is and what opportunities to use indexes it provides. But note that they are not equivalent in terms of results if there are tasks that don't have projects. The second is equivalent to this:

select tasks.*
from tasks
where 
  some criteria
  and task.project_id in (select id from project where project.is_template <> 1);
ysth
"some criteria" can pretty much reduce the overal number of records returned. task.project_id is required, so these 2 queries are equivalent.I choose "not in" query cause then subselect returns much smaller numer of records then it would if I choose "in".
Dev er dev
A: 

I think the first may scale better:

When you do a join, internally mysql makes a sort of temporary table consisting of the two tables joined according to the join conditions specified. You aren't giving a join condition, so it'll create a temp table with all tasks listed against all projects. I'm fairly sure (but do check with the explain tool) that it does this prior to applying any where clauses.

Result: if there are 10 of each, it'll have 10 * 10 rows = 100. You can see how this gets big as numbers rise. It then applies the where to this temporary table.

By contrast, the subquery selects only the relevant rows from each table.

But unless scaling is a concern, I don't think it really matters.

benlumley
no one agrees with me .... i want some discussion.
benlumley
Try a simple explain to prove yourself wrong.
ysth
A: 

Avoid sub queries like the plague in MySQL versions < 6.0, and I doubt you're using 6.0 considering it's still in the alpha phase of development. AFAIK, the MySQL optimizer does not handle subqueries well at all. Some major work has gone into revamping the optimizer for 6.0 and sub queries are working much better now, but these changes haven't trickled down into the 5.0 or 5.1 series.

Grant Limberg
You are tarring with a wide brush there. Can you make your comment specific to this case? It's hard to see how lack of optimization could cause a substantial difference unless it runs the subquery repeatedly for each task row, and performance testing should whether that's the case.
ysth
In every case I've seen, even when the query returns an unchanging set of rows for the IN operation, the subquery is run for each result of the main query.
Grant Limberg