ansaurus

Question

How does the IN predicate work in SQL?

Answer 1

A:

Not really. But it's butter to write such queries using JOIN

Koistya Navin 2009-04-17 16:33:48

Answer 2

+4 A:

Depends on optimizer. Check exact query plan for each particular query to see how the RDBMS will actually execute that.

In Oracle that'd be:

EXPLAIN PLAN FOR «your query»

In MySQL or PostgreSQL

EXPLAIN «your query»

vartec 2009-04-17 16:34:18

@vartec - any resources on reading query plans? This is a blind side of my knowledge and I've yet to find a decent article online.

Gavin Miller 2009-04-17 16:36:12

it's DBMS dependent, but basically that gives you idea what basic operation it will do and in which order, which index will it use etc.

vartec 2009-04-17 16:39:32

I think that what you'd have to look for, would be article about DB tuning, as this is a purpose of looking at query plans in the first place.

vartec 2009-04-17 16:42:22

See this SO question: http://stackoverflow.com/questions/79266/how-do-you-interpret-a-querys-explain-plan

Nathan Koop 2009-04-17 16:43:13

I added a new question: http://stackoverflow.com/questions/761204/list-of-resources-for-database-tuning

Gavin Miller 2009-04-17 16:45:55

good question actually, I've never though about how do I know about this stuff. Guess it came with experience ;-)

vartec 2009-04-17 16:56:17

Answer 3

+7 A:

It will entirely depend on the database you're using, and the exact query.

Query optimisers are very smart at times - in your sample query, I'd expect the better databases to be able to use the same sort of techniques that they do with a join. More naive databases may just execute the same query many times.

Jon Skeet 2009-04-17 16:34:20

I agree, I just did a test query and I believe (I'm not great at reading ExecutionPlans) it created an inner join.

Nathan Koop 2009-04-17 16:36:55

Answer 4

+3 A:

Most SQL engines nowadays will almost always create the same execution plan for LEFT JOIN, NOT IN and NOT EXISTS

I would say look at your execution plan and find out :-)

Also if you have NULL values for the Table1Id column you will not get any data back

SQLMenace 2009-04-17 16:34:30

Answer 5

+3 A:

This depends on the RDBMS in question.

See detailed analysis here:

In short:

MySQL will optimize the query to this:

SELECT  COUNT(*)
FROM    Table1 t1
WHERE   NOT EXISTS
        (
        SELECT  1
        FROM    Table2 t2
        WHERE   t2.id_user = 1
                AND t2.Table1ID = t1.Table2ID
        )

and run the inner subquery in a loop, using the index lookup each time.

SQL Server will use MERGE ANTI JOIN.

The inner subquery will not be "executed" in a common sense of word, instead, the results from both query and subquery will be fetched concurrently.

See the link above for detailed explanation.
Oracle will use HASH ANTI JOIN.

The inner subquery will be executed once, and a hash table will be built from the resultset.

The values from the outer query will be looked up in the hash table.
PostgreSQL will use NOT (HASHED SUBPLAN).

Much like Oracle.

Note that rewriting the query as this:

SELECT  (
        SELECT  COUNT(*)
        FROM    Table1
        ) - 
        (
        SELECT  COUNT(*)
        FROM    Table2 t2
        WHERE   (t2.id_user, t2.Table1ID) IN
                (
                SELECT  1, Table1ID
                FROM    Table1
                )
        )

will greatly improve the performance in all four systems.

Quassnoi 2009-04-17 16:37:42

Last time I looked, SQL Server only allows a single column to be returned from a subquery that is then used in an IN predicate. Has that changed?

RolandTumble 2009-07-29 16:48:45

@RolandTumble: no. `MySQL` and `PostgreSQL` allow that, `SQL Server` and `Oracle` don't. For the latter two systems, the `IN` predicate should be rewritten as `EXISTS`.

Quassnoi 2009-07-29 16:51:11

Answer 6

A:

Yes, but execution stops as soon as the query processer "finds" the value you are looking for... So if, for example the first row in the outer select has Table1Id = 32, and if Table2 has a record with a TableId = 32, then as soon as the subquery finds the row in Table2 where TableId = 32, it stops...

Charles Bretana 2009-04-17 16:38:06

Answer 7

+12 A:

The warning you got about subqueries executing for each row is true -- for correlated subqueries.

SELECT COUNT(*) FROM Table1 a 
WHERE a.Table1id NOT IN (
  SELECT b.Table1Id FROM Table2 b WHERE b.id_user = a.id_user
);

Note that the subquery references the id_user column of the outer query. The value of id_user on each row of Table1 may be different. So the subquery's result will likely be different, depending on the current row in the outer query. The RDBMS must execute the subquery many times, once for each row in the outer query.

The example you tested is a non-correlated subquery. Most modern RDBMS optimizers worth their salt should be able to tell when the subquery's result doesn't depend on the values in each row of the outer query. In that case, the RDBMS runs the subquery a single time, caches its result, and uses it repeatedly for the predicate in the outer query.

PS: In SQL, IN() is called a "predicate," not a statement. A predicate is a part of the language that evaluates to either true or false, but cannot necessarily be executed independently as a statement. That is, you can't just run this as an SQL query: "2 IN (1,2,3);" Although this is a valid predicate, it's not a valid statement.

Bill Karwin 2009-04-17 17:35:59

+1 Great answer; Also thanks for the terminology lesson!

Gavin Miller 2009-04-17 17:58:23

In MySQL, the subquery will be optimized to NOT EXISTS (which is DEPENDENT SUBQUERY), and will use index access on `INDEX ON Table2 (id_user, table1ID)` if it exists.

Quassnoi 2009-04-17 18:05:15

ansaurus

tags:

views:

answers:

How does the IN predicate work in SQL?

related questions