tags:

views:

363

answers:

5
SELECT i.*, i.id IN (
  SELECT id
  FROM w 
  WHERE w.status='active') AS wish 
FROM i
INNER JOIN r ON i.id=r.id
WHERE r.member_id=1 && r.status='active' 
ORDER BY wish DESC 
LIMIT 0,50

That's a query that I'm trying to run. It doesn't scale well, and I'm wondering if someone here can tell me where I could improve things. I don't join w to r and i because I need to show rows from i that are unrepresented in w. I tried a left join, but it didn't perform too well. This is better, but not ideal yet. All three tables are very large. All three are indexed on the fields I'm joining and selecting on.

Any comments, pointers, or constructive criticisms would be greatly appreciated.

EDIT Addition:

I should have put this in my original question. It's the EXPLAIN as return from SQLYog.

id|select_type       |table|type          |possible_keys|key      |key_len|ref  |rows|Extra|  
1 |PRIMARY           |r    |ref           |member_id,id |member_id|3      |const|3120|Using where; Using temporary; Using filesort  
1 |PRIMARY           |i    |eq_ref        |id           |id       |8      |r.id |1   |  
2 |DEPENDENT SUBQUERY|w    |index_subquery|id,status    |id       |8      |func |8   |Using where


EDIT le dorfier - more comments ...

I should mention that the key for w is (member_id, id). So each id can exist multiple times in w, and I only want to know if it exists.

+1  A: 

Please post the EXPLAIN listing. And explain what the tables and columns mean.

wish appears to be a boolean - and you're ORDERing by it?


EDIT: Well, it looks like it's doing what it's being instructed to do. Cade seems to be thinking expansively on what this all could possibly mean (he probably deserves a vote just for effort.) But I'd really rather you tell us.

Wild guessing just confuses everyone (including you, I'm sure.)


OK, based on new info, here's my (slightly less wild) guess.

SELECT i.*,  
    CASE WHEN EXISTS (SELECT 1 FROM w WHERE id = i.id AND w.status = 'active' THEN 1 ELSE 0 END) AS wish  
FROM i  
INNER JOIN r ON i.id = r.id AND r.status = 'active'  
WHERE r.member_id = 1

Do you want a row for each match in w? Or just to know for i.id , whether there is an active w record? I assumed the second answer, so you don't need to ORDER BY - it's for only one ID anyway. And since you're only returning columns from i, if there are multiple rows in r, you'll just get duplicate rows.

How about posting what you expect to get for a proper answer?

le dorfier
A: 

Have you tried this?

SELECT i.*, w.id as wish FROM i
LEFT OUTER JOIN w ON i.id = w.id
  AND w.status = 'active'
WHERE i.id in (SELECT id FROM r WHERE r.member_id = 1 AND r.status = 'active')
ORDER BY wish DESC
LIMIT 0,50
jmucchiello
It seems to be slower. I should mention that the key for w is (member_id, id). So each id can exist multiple times in w, and I only want to know if it exists.
Jeremy DeGroot
Does that mean you can add w.member_id = 1 to the ON clause?
jmucchiello
+1  A: 

I should have put this in my original question. It's the EXPLAIN as return from SQLYog.
id|select_type|table|type|possible_keys|key|key_len|ref|rows|Extra|
1|PRIMARY|r|ref|member_id,id|member_id|3|const|3120|Using where; Using temporary; Using filesort
1|PRIMARY|i|eq_ref|id|id|8|r.id|1|
2|DEPENDENT SUBQUERY|w|index_subquery|id,status|id|8|func|8|Using where

Jeremy DeGroot
This should be part of the question, not posted as an answer.
Rex M
Not so easy to do, if you only have 33 rep.
le dorfier
Pretty easy when it's your own question.
Rex M
+3  A: 

WHERE x IN () is identical to an INNER JOIN to a SELECT DISTINCT subquery, and in general, a join to a subquery will typically perform better if the optimizer doesn't turn the IN into a JOIN - which it should:

SELECT i.*
FROM i
INNER JOIN (
    SELECT DISTINCT id
    FROM w 
    WHERE w.status = 'active'
) AS wish 
    ON i.id = wish.id
INNER JOIN r
    ON i.id = r.id
WHERE r.member_id = 1 && r.status = 'active' 
ORDER BY wish.id DESC 
LIMIT 0,50

Which, would probably be equivalent to this if you don't need the DISTINCT:

SELECT i.*
FROM i
INNER JOIN w 
    ON w.status = 'active'
    AND i.id = wish.id
INNER JOIN r
    ON i.id = r.id
    AND r.member_id = 1 && r.status = 'active' 
ORDER BY i.id DESC 
LIMIT 0,50

Please post your schema.

If you are using wish as an existence flag, try:

SELECT i.*, CASE WHEN w.id IS NOT NULL THEN 1 ELSE 0 END AS wish
FROM i
INNER JOIN r
    ON i.id = r.id
    AND r.member_id = 1 && r.status = 'active' 
LEFT JOIN w 
    ON w.status = 'active'
    AND i.id = w.id
ORDER BY wish DESC 
LIMIT 0,50

You can use the same technique with a LEFT JOIN to a SELECT DISTINCT subquery. I assume you aren't specifying the w.member_id because you want to know if any members have this? In this case, definitely use the SELECT DISTINCT. You should have an index with id as the first column on w as well in order for that to perform:

SELECT i.*, CASE WHEN w.id IS NOT NULL THEN 1 ELSE 0 END AS wish
FROM i
INNER JOIN r
    ON i.id = r.id
    AND r.member_id = 1 && r.status = 'active' 
LEFT JOIN (
    SELECT DISTINCT w.id
    FROM w 
    WHERE w.status = 'active'
) AS w
    ON i.id = w.id
ORDER BY wish DESC 
LIMIT 0,50
Cade Roux
Not only are there good explanations here of a couple of options, it contains the one that looks best to me: a LEFT JOIN. That should speed things up quite a bit as it should allow things to be optimized much better.
MBCook
+1  A: 
...
ORDER BY wish DESC 
LIMIT 0,50

This appears to be the big expense. You're sorting by a computed column "wish" which cannot benefit from an index. This forces it to use a filesort (as indicated by the EXPLAIN) output, which means it writes the whole result set to disk and sorts it using disk I/O which is very slow.

When you post questions like this, you should not expect people to guess how you have defined your tables and indexes. It's very simple to get the full definitions:

mysql> SHOW CREATE TABLE w;
mysql> SHOW CREATE TABLE i;
mysql> SHOW CREATE TABLE r;

Then paste the output into your question.

It's not clear what your purpose is for the "wish" column. The "IN" predicate is a boolean expression, so it always results in 0 or 1. But I'm guessing you're trying to use "IN" in hopes of accomplishing a join without doing a join. It would help if you describe what you're trying to accomplish.

Try this:

SELECT i.*
FROM i
 INNER JOIN r ON i.id=r.id
 LEFT OUTER JOIN w ON i.id=w.id AND w.status='active'
WHERE r.member_id=1 AND r.status='active'
 AND w.id IS NULL
LIMIT 0,50;

It uses an additional outer join, but it doesn't incur a filesort according to my test with EXPLAIN.

Bill Karwin
The ones in w won't sort to the top, which appears to be what he wants. Effectively, the LIMIT only deals with the output. Might be quicker to INNER JOIN on w to get the wish, LEFT JOIN with NULL to get the non-wish and then UNION.
Cade Roux
It's still not clear what he wants, and the query in his original question is very odd, making me think he hasn't defined the desired result well enough.
Bill Karwin