I have a query that looks like this:
select
id
, int1
, int2
, (select count(*) from big_table_with_millions_of_rows
where id between t.int1 and t.int2)
from myTable t
where
....
This select returns exactly one row. The id used in the inline select is an indexed column (primary key). If I replace t.int1
and t.int2
with the values of int1/int2 returned by this single row, the query completes in milliseconds. If I execute the query as above - i.e. with references to int1/int2, it takes about 10 minutes. When I run profiler and look at what actually happens, I see that 99% of the time the engine is busy returning data from the inline query. It looks as though MySql is actually running the
select ... from big_table_with_millions_of_rows
bit of the inline query once before applying the
where id between t.int1 and t.int2
bit to the result. Can this be true? If not, then what is going on? I had always thought that inline SELECT
s were potentially hazardous because they are executed row-by-row as the last element of the query, but for situations like this, where the initial SELECT
is indeed highly selective, it can be very efficient. Can anyone shed any light on this?
EDIT: thanks for the feedback so far. My concern is not so much about the row-by-row nature of the inline query, but rather the fact that it seems unable to use the primary key index when faced with variables rather than (the same) hardcoded values. My guess would be that if ANALYZE has not been run recently, then the optimizer assumes it has to do a table scan as it has no knowledge about the data distribution. But shouldn't the fact that the range lookup is done on the primary key not compensate for that?