views:

61

answers:

4

I have a query that looks something like

SELECT to_number FROM sent_texts 
WHERE to_number NOT IN(SELECT mobile FROM action_6_members);

A WHERE is applied to the result set after the query is complete.
What would the effect be (improve/degrade) if the sub query contained

WHERE mobile = to_number

A HAVING is applied to the result set during the query.
What would the effect be (improve/degrade) if the sub query contained

HAVING mobile = to_number

What are the pros/cons of just using the original query?


Update
It seems my initial thoughts were wrong, thanks to Bill Karwin's answer.

So Im going to update this with the explain of the original query.

This query is causing me my server to use 100% of the cpu.

Maybe someone can say why, and how to fix it?

id  select_type         table             type   possible_keys  key        key_len  ref  rows    Extra                     
2   DEPENDENT SUBQUERY  action_6_members  index                 mobile     42            179218  Using where; Using index  
1   PRIMARY             sent_txts         index                 to_number  123           256066  Using where; Using index   

This is the explain based off the join(after some more optimization)

id  select_type  table             type   possible_keys  key        key_len  ref  rows    Extra                                 
1   SIMPLE       sent_txts         index                 to_number  78            256066  Using index                           
1   SIMPLE       action_6_members  index                 mobile     27            179218  Using where; Using index; Not exists  
A: 

Both expressions return the same result.

Csaryus
They will get to the same result yes, But I am talking about the performance side of things.
Hailwood
+5  A: 

Just use the original query. MySQL optimizes this case all right, especially if mobile is an indexed column. It runs the non-correlated subquery once, and compares to_number to the set of mobile numbers reasonably efficiently.

I don't know where you got the ideas about WHERE conditions being applied after the query and HAVING conditions being applied during the query. This is not accurate.

Think of it this way:

  • WHERE conditions eliminate rows from the result set. This is done during the query.

  • HAVING conditions eliminate groups from the result set. This is also done during the query, but after GROUP BY has collected rows into groups.

    You should not use HAVING if you aren't using GROUP BY.

Bill Karwin
A: 

Maybe this. SELECT to_number FROM sent_texts WHERE NOT EXISTS(SELECT mobile FROM action_6_members where mobile = to_number);

I'm not sure if it will help. Try to compare the execution plans.

Søren Randrup
A: 

I think what you're looking for with your WHERE mobile = to_number is a correlated subquery:

SELECT to_number
FROM sent_texts
WHERE NOT EXISTS (
    SELECT 1
    FROM action_6_members
    WHERE mobile=sent_texts.to_number
);

In general correlated subqueries are undesirable. In this case though, MySQL should optimise both the correlated and the original subquery down to the same kind of DEPENDENT SUBQUERY. You can check this with your data and indexes using EXPLAIN SELECT ....

In any case you are typically better off using joins rather than subqueries in MySQL, when you can. This null-join:

SELECT to_number
FROM sent_texts
LEFT JOIN action_6_members ON action_6_members.mobile=to_number
WHERE action_6_members.mobile IS NULL

Executed 33% faster than either subquery approach for me (may differ for your data; for small data the difference may not be measurable).

bobince
I just tried this also, Same result, server uses 100% cpu
Hailwood
The `EXPLAIN` output in your update is a bit odd. It claims to be `Using index`, but the number of rows scanned in the subquery says otherwise and the `possible_keys` and `refs` columns are just missing. Do you have indexes on the `to_number` and `mobile` columns? You're going to need indexes to make this kind of query tolerable whether you use joins or subqueries.
bobince
Yeah, I have a btree index on both.I just went And did some more optimization, So I will post a explain on your query
Hailwood
Maybe post the schema? Ensure that the columns have the same exact datatype (including collation)? If they're even slightly different it prevents the indexes being used togethers.
bobince