tags:

views:

99

answers:

3

Hello, can someone please explain why the addition of the group by subquery makes this query takes so long (30secs):

SELECT *
FROM aggregate_songlist AS a
INNER JOIN musical_works AS m 
ON a.musical_work_id = m.id
WHERE m.genre='rock' AND m.id NOT IN 
(SELECT sources.musical_work_id FROM sources GROUP BY sources.musical_work_id HAVING COUNT(sources.musical_work_id) > 8)

If I remove the 'group by' (and increasing the results of the subquery) it takes 0.07 seconds:

SELECT *
FROM aggregate_songlist AS a
INNER JOIN musical_works AS m 
ON a.musical_work_id = m.id
WHERE m.genre='rock' AND m.id NOT IN 
(SELECT sources.musical_work_id FROM sources)

There are no outer references in the subquery, so it should only be executed once, right? Executing it on its own:

SELECT sources.musical_work_id FROM sources GROUP BY sources.musical_work_id HAVING COUNT(sources.musical_work_id) > 8

takes only 0.01 seconds.

Any explanations? Any suggestions of how to change it?

+2  A: 

The NOT IN is probably your issue. Try joining it instead (you have to flip the HAVING clause around):

SELECT *
FROM aggregate_songlist AS a
INNER JOIN musical_works AS m 
ON a.musical_work_id = m.id
LEFT JOIN (
      SELECT sources.musical_work_id FROM sources 
      GROUP BY sources.musical_work_id   
      HAVING COUNT(sources.musical_work_id) <= 8) AS t
ON m.id = t.musical_work_id
WHERE m.genre='rock' AND t IS NULL

[updated to reflect @Mark Byers comment, thanks!]

Scott Stafford
This will not return rows that exist in the `a JOIN m` but are completely absent in `source`. Using `LEFT JOIN ... WHERE ... IS NULL` will correct that.
Mark Byers
@Mark Byers: Thanks, I updated my answer to reflect your suggestion.
Scott Stafford
+6  A: 

There are no outer references in the subquery, so it should only be executed once, right?

You would think so, but no. If you look at EXPLAIN you will see that the subquery is called a "DEPENDENT SUBQUERY" instead of "SUBQUERY". This means it is re-executed each time. This is a known bug in MySQL 5.0 and is fixed in MySQL 6.0.

To work around it you can use one of the other approaches to check if a row doesn't exist in another table. The three common methods are NOT IN, NOT EXISTS, and LEFT JOIN ... WHERE ... IS NULL, so you still have two options left.

Mark Byers
thanks. a NOT EXISTS did the trick.
A: 
SELECT *
FROM 
aggregate_songlist AS a
INNER JOIN musical_works AS m 
ON a.musical_work_id = m.id
LEFT JOIN (
      SELECT sources.musical_work_id FROM sources 
      GROUP BY sources.musical_work_id   
      HAVING COUNT(sources.musical_work_id) <= 8)
AS t
ON m.id = t.musical_work_id
WHERE
m.genre='rock' AND
t.musical_work_id IS NULL
gajendra.bang