views:

64

answers:

3

Hi, I just enabled the slow-log (+not using indexes) and I'm getting hundreds of entries for the same kind of query (only user changes)

SELECT id
     , name 
  FROM `all` 
 WHERE id NOT IN(SELECT id 
                   FROM `picks` 
                  WHERE user=999) 
ORDER BY name ASC;

EXPLAIN gives:

+----+--------------------+-------------------+-------+------------------+--------+---------+------------+------+------------------------------------------+
| id | select_type        | table             | type  | possible_keys    | key    | key_len | ref        | rows | Extra                                    |
+----+--------------------+-------------------+-------+------------------+--------+---------+------------+------+------------------------------------------+
|  1 | PRIMARY            | all               | index | NULL             | name   | 156     | NULL       |  209 | Using where; Using index; Using filesort | 
|  2 | DEPENDENT SUBQUERY | picks             | ref   | user,user_2,pick | user_2 | 8       | const,func |    1 | Using where; Using index                 | 
+----+--------------------+-------------------+-------+------------------+--------+---------+------------+------+------------------------------------------+

Any idea about how to optimize this query? I've tried with a bunch of different indexes on different fields but nothing.

A: 

"not in" and "exists" always bad choices for performance. May be left join with cheking "NULL" will be better try it.

Alexey Sviridov
A: 

I don't necessarily agree that 'not in' and 'exists' are ALWAYS bad performance choices, however, it could be in this situation.

You might be able to get your results using a much simpler query:

SELECT id
     , name 
  FROM `all`
     , 'picks'
 WHERE all.id = picks.id
   AND picks.user <> 999 
ORDER BY name ASC;
northpole
That will get all picks belonging to any user except the 999 one, right? What the query does (but not in an efficient way) is getting the list of 'available' picks but removing the ones already picked by the user.
Adrián
This query wouldn't return anything for rows that don't have a matching entry in `picks`, while the original query would.
Lukáš Lalinský
simple enough fix, just outer join it.
northpole
I'm thinking on doing this on the software side. The same subquery is used a few lines after this one, and may be a good idea to perform the discard-if-exists in memory.
Adrián
How are you going to do a "not in" subquery in memory without selecting *all* rows? That's going to be slower than the sequential scan on the DB server. Seriously, just create an index on picks (id, user) and use the left join query.
Lukáš Lalinský
In fact, I was just thinking about putting the pick.id in an array and looping about every match of the DB, then omitting the ones present in the array. I also use pick.id later, so it's not a problem for me!
Adrián
A: 

This is probably the best way to write the query. Select everything from all and try to find matching rows from picks that share the same id and user is 999. If such a row doesn't exist, picks.id will be NULL, because it's using a left outer join. Then you can filter the results to return only those rows.

SELECT all.id, all.name 
FROM
    all
    LEFT JOIN picks ON picks.id=all.id AND picks.user=999
WHERE picks.id IS NULL
ORDER BY all.name ASC
Lukáš Lalinský