ansaurus

Question

Answer 1

A:

From my observations, MSSQL server produces same query plan for these queries.

aku 2008-09-06 13:08:16

Answer 2

A:

I created a simple query similar to the ones in the question on MSSQL2005 and the explain plans were different. The first query appears to be faster. I am not a SQL expert but the estimated explain plan had 37% for query 1 and 63% for the query 2. It appears that the biggest cost for query 2 is the join. Both queries had two table scans.

Mike Polen 2008-09-06 13:16:42

Answer 3

+9 A:

RDBMSs "rewrite" queries to optimize them, so it depends on system you're using, and I would guess they end up giving the same performance on most "good" databases.

I suggest picking the one that is clearer and easier to maintain, for my money, that's the first one. It's much easier to debug the subquery as it can be run independently to check for sanity.

Tom 2008-09-06 13:17:00

Answer 4

+2 A:

I second Tom's answer that you should pick the one that is easier to understand and maintain.

The query plan of any query in any database cannot be predicted because you haven't given us indexes or data distributions. The only way to predict which is faster is to run them against your database.

As a rule of thumb I tend to use sub-selects when I do not need to include any columns from tblB in my select clause. I would definitely go for a sub-select when I want to use the 'in' predicate (and usually for the 'not in' that you included in the question), for the simple reason that these are easier to understand when you or someone else has come back and change them.

andy47 2008-09-07 08:35:23

Answer 5

A:

The first query will be faster in SQL Server which I think is slighty counter intuitive - Sub queries seem like they should be slower. In some cases (as data volumes increase) an exists may be faster than an in.

Martynnw 2008-09-11 21:49:41

Answer 6

+3 A:

non-correlated sub queries are fine. you should go with what describes the data you're wanting. as has been noted, this likely gets rewritten into the same plan, but isn't guaranteed to! what's more, if table A and B are not 1:1 you will get duplicate tuples from the join query (as the IN clause performs an implicit DISTINCT sort), so it's always best to code what you want and actually think about the outcome.

Andy Irving 2008-09-15 16:29:36

Answer 7

A:

It should be noted that these queries will produce different results if TblB.a is not unique.

David B 2008-09-15 17:35:19

Answer 8

+2 A:

Well, it depends on the datasets. From my experience, if You have small dataset then go for a NOT IN if it's large go for a LEFT JOIN. The NOT IN clause seems to be very slow on large datasets.

One other thing I might add is that the explain plans might be misleading. I've seen several queries where explain was sky high and the query run under 1s. On the other hand I've seen queries with excellent explain plan and they could run for hours.

So all in all do test on your data and see for yourself.

Piotr Anders 2008-09-16 08:30:05

ansaurus

tags:

views:

answers:

subselect vs outer join

related questions