views:

48

answers:

4

MySql question:

What is the difference between [NOT] IN and [NOT] EXIST when doing subqueries in MySql.

+3  A: 

They work very differently:

  • EXISTS takes a single argument which should be a subquery (derived table) and checks if there is at least one row returned by the subquery.
  • IN takes two arguments, the first of which should be a single value (or a tuple), and the second of which is a subquery or a tuple and checks if the first value is contained in second.

However both can be used to check if a row in table A has a matching row in table B. Unless you are careful and know what you are doing I would stay clear of IN in MySQL as it often gives much poorer performance on more complex queries. Use NOT EXISTS or a LEFT JOIN ... WHERE ... IS NULL.

Mark Byers
+2  A: 

This is more about SQL execution in general. There's a good article at blogspot that describes the different execution plans with examples.

AndiDog
+1  A: 

Here is all the info you could want about the issue link to info

Jason
+2  A: 

EXISTS

EXISTS literally is for checking for the existence of specified criteria. In current standard SQL, it will allow you to specify more than one criteria for comparison - IE if you want to know when col_a and col_b both match - which makes it a little stronger than the IN clause. MySQL IN supports tuples, but the syntax is not portable, so EXISTS is a better choice both for readability and portability.

The other thing to be aware of with EXISTS is how it operates - EXISTS returns a boolean, and will return a boolean on the first match. So if you're dealing with duplicates/multiples, EXISTS will be faster to execute than IN or JOINs depending on the data and the needs.

IN

IN is syntactic sugar for OR clauses. While it's very accommodating, there are issues with dealing with lots of values for that comparison (north of 1,000).

NOT

The NOT operator just reverses the logic.

Subqueries vs JOINs

The mantra "always use joins" is flawed, because JOINs risks inflating the result set if there is more than one child record against a parent. Yes, you can use DISTINCT or GROUP BY to deal with this, but it's very likely this renders the performance benefit of using a JOIN moot. Know your data, and what you want for a result set - these are key to writing SQL that performs well.

To reiterate knowing when and why to know what to use - LEFT JOIN IS NULL is the fastest exclusion list on MySQL if the columns compared are NOT nullable, otherwise NOT IN/NOT EXISTS are better choices.

Reference:

OMG Ponies