ansaurus

Question

A subquery that should be independent is not. Why?

Answer 1

+2 A:

Try:

SELECT * FROM files WHERE file_id NOT IN (SELECT reades.file_id FROM reades)

That is: if it's coming up as dependent, perhaps that's because of ambiguity in what file_id refers to, so let's try fully qualifying it.

If that doesn't work, just do:

SELECT files.*
FROM files
LEFT JOIN reades
USING (file_id)
WHERE reades.file_id IS NULL

chaos 2009-09-16 18:21:41

Okay. Yeah, I have no idea why that would be.

chaos 2009-09-16 18:26:24

The first suggestion is also recognized as a dependent subquery. The second works, but is slow.

fuenfundachtzig 2009-09-16 18:28:36

Well, it appears to me that the MySQL parser is wilfully misunderstanding the query - or the note 'DEPENDENT SUBQUERY' means something different from 'correlated sub-query', which is the term normally used for a sub-query that depends on a 'current row value' from the main query.

Jonathan Leffler 2009-09-16 18:28:53

Is `file_id` indexed on both tables?

chaos 2009-09-16 18:29:16

No. If I do index it, your solution is also quick.

fuenfundachtzig 2009-09-16 18:52:29

Answer 2

+2 A:

Try replacing the subquery with a join:

SELECT * 
FROM files f
LEFT OUTER JOIN reades r on r.file_id = f.file_id
WHERE r.file_id IS NULL

Here's a link to an article about this problem. The writer of that article wrote a stored procedure to force MySQL to evaluate subqueries as independant. I doubt that's necessary in this case though.

Andomar 2009-09-16 18:25:05

Hm. EXPLAIN identifies no correlated queries, but it's still very slow.

fuenfundachtzig 2009-09-16 18:27:42

This should be fast on any decent machine. Consider adding an index on reades.file_id, or making it a foreign key.

Andomar 2009-09-16 18:29:16

I added a foreign key (ALTER TABLE reades ADD FOREIGN KEY (file_id) REFERENCES files (file_id);) and it's much faster now. (Though I thought mySQL did not support foreign keys?!)

fuenfundachtzig 2009-09-16 18:38:03

Creating a foreign key implies creating an index, which speeds things up considerably.

Andomar 2009-09-16 18:40:29

Answer 3

+2 A:

i've seen this before. it's a bug in mysql. try this:

SELECT * FROM files WHERE file_id NOT IN (SELECT * FROM (SELECT file_id FROM reades))

there bug report is here: http://bugs.mysql.com/bug.php?id=25926

longneck 2009-09-16 18:32:10

found it: http://bugs.mysql.com/bug.php?id=25926

longneck 2009-09-16 18:36:04

+1 Interesting link; you could edit your question, instead of adding the link as a comment.

Andomar 2009-09-16 18:39:18

Answer 4

A:

Does MySQL support EXISTS in the same way that MSSQL would? If so, you could rewrite the query as

SELECT * FROM files as f WHERE file_id NOT EXISTS (SELECT 1 FROM reades r WHERE r.file_id = f.file_id)

Using IN is horribly inefficient as it runs that subquery for each row in the parent query.

Chad 2009-09-16 18:40:09

Answer 5

A:

Looking at this page I found two possible solutions which both work. Just for completeness I add one of those, similar to the answers with JOINs shown above, but it is fast even without using foreign keys:

  SELECT * FROM files AS f 
    INNER JOIN (SELECT DISTINCT file_id FROM reades) AS r 
    ON f.file_id = r.file_id

This solves the problem, but still this does not answer my question :)

EDIT: If I interpret the EXPLAIN output correctly, this is fast, because the interpreter generates a temporary index:

+----+-------------+------------+--------+---------------+---------+---------+-----------+------+--------------------------+
| id | select_type | table      | type   | possible_keys | key     | key_len | ref       | rows | Extra                    |
+----+-------------+------------+--------+---------------+---------+---------+-----------+------+--------------------------+
|  1 | PRIMARY     | <derived2> | ALL    | NULL          | NULL    | NULL    | NULL      |  843 |                          |
|  1 | PRIMARY     | f          | eq_ref | PRIMARY       | PRIMARY | 4       | r.file_id |    1 |                          |
|  2 | DERIVED     | reades     | range  | NULL          | file_id | 5       | NULL      |  811 | Using index for group-by |
+----+-------------+------------+--------+---------------+---------+---------+-----------+------+--------------------------+

fuenfundachtzig 2009-09-16 18:40:48

If you have to run DISTINCT for your `READES.file_id`, that's information we wouldn't be aware of.

OMG Ponies 2009-09-16 18:42:51

Does "MySQL has a sloppy query optimizer" answer your question? :P

Andomar 2009-09-16 18:49:49

I wouldn't have imagined that indexing a column has such an impact in this kind of queries... I thought mySQL would recognize the subquery as constant, but obviously one has to be careful with such assumptions.

fuenfundachtzig 2009-09-16 19:09:56

ansaurus

tags:

views:

answers:

A subquery that should be independent is not. Why?

related questions