views:

134

answers:

3

Check the question This SELECT query takes 180 seconds to finish (check the comments on the question itself).
The IN get to be compared against only one value, but still the time difference is enormous.
Why is it like that?

+1  A: 

SQL optimizers don't always do what you expect them to do. I'm not sure there's a better answer than that. That's why you have to examine EXPLAIN PLAN output, and profile your queries to find out where the time is spent.

Ned Batchelder
+1 for recommending EXPLAIN as a starting point for analyzing query performance.
Cumbayah
+1  A: 

It's about inner queries a.k.a subqueries vs joins, not about IN vs =, ant the reasons are explained in that post. MySQL's version 5.4 is suppposed to introduce an improved optimiser, that can rewrite some subqueries into more efficient form.

The worst thing you can do, is to use so called correlated subquery http://dev.mysql.com/doc/refman/5.1/en/correlated-subqueries.html

Mchl
+11  A: 

Summary: This is a known problem in MySQL and will be fixed in MySQL 6.0. The problem is due to a missing optimization when a subquery using IN is incorrectly indentified as dependent subquery instead of an independent subquery.


When you run EXPLAIN on the original query it returns this:

1  'PRIMARY'             'question_law_version'  'ALL'  ''  ''  ''  ''  10148  'Using where'
2  'DEPENDENT SUBQUERY'  'question_law_version'  'ALL'  ''  ''  ''  ''  10148  'Using where'
3  'DEPENDENT SUBQUERY'  'question_law'          'ALL'  ''  ''  ''  ''  10040  'Using where'

When you change IN to = you get this:

1  'PRIMARY'   'question_law_version'  'ALL'  ''  ''  ''  ''  10148  'Using where'
2  'SUBQUERY'  'question_law_version'  'ALL'  ''  ''  ''  ''  10148  'Using where'
3  'SUBQUERY'  'question_law'          'ALL'  ''  ''  ''  ''  10040  'Using where'

Each dependent subquery is run once per row in the query it is contained in, whereas the subquery is run only once. MySQL can sometimes optimize dependent subqueries when there is a condition that can be converted to a join but here that is not the case.

Now this of course leaves the question of why MySQL believes that the IN version needs to be a dependent subquery. I have made a simplified version of the query to help investigate this. I created two tables 'foo' and 'bar' where the former contains only an id column, and the latter contains both an id and a foo id (though I didn't create a foreign key constraint). Then I populated both tables with 1000 rows:

CREATE TABLE foo (id INT PRIMARY KEY NOT NULL);
CREATE TABLE bar (id INT PRIMARY KEY, foo_id INT NOT NULL);

-- populate tables with 1000 rows in each

SELECT id
FROM foo
WHERE id IN
(
    SELECT MAX(foo_id)
    FROM bar
);

This simplified query has the same problem as before - the inner select is treated as a dependent subquery and no optimization is performed, causing the inner query to be run once per row. The query takes almost one second to run. Changing the IN to = again allows the query to run almost instantly.

The code I used to populate the tables is below, in case anyone wishes to reproduce the results.

CREATE TABLE filler (
        id INT NOT NULL PRIMARY KEY AUTO_INCREMENT
) ENGINE=Memory;

DELIMITER $$

CREATE PROCEDURE prc_filler(cnt INT)
BEGIN
        DECLARE _cnt INT;
        SET _cnt = 1;
        WHILE _cnt <= cnt DO
                INSERT
                INTO    filler
                SELECT  _cnt;
                SET _cnt = _cnt + 1;
        END WHILE;
END
$$

DELIMITER;

CALL prc_filler(1000);

INSERT foo SELECT id FROM filler;
INSERT bar SELECT id, id FROM filler;
Mark Byers
Is there a way to force the optimizer to treat a subquery as merely a subquery and not a dependant subquery?
Itay Moav
@Itay Moav: MySQL ought to be able to work out by itself which subqueries are dependent on outer queries. I'm still a little surprised that in this case it thinks the inner query is a dependent query when there is clearly no reference to the original table. I might search the bugs database to see if anyone has reported this issue.
Mark Byers
@Itay Moav: I have simplified the query and replicated the same problem on the simpler query. I have found a bug report in MySQL that describes the exact same problem. The MySQL developers promise a fix. I have updated my answer accordingly. I hope this answers your question fully. PS: +1 for the good question that required me to do some research! :)
Mark Byers
+1 to you for the thoroughly and good answer
Itay Moav