tags:

views:

1460

answers:

8

Is there any difference in the performance of the following three SQL statements?

SELECT * FROM tableA WHERE EXISTS (SELECT * FROM tableB WHERE tableA.x = tableB.y)

SELECT * FROM tableA WHERE EXISTS (SELECT y FROM tableB WHERE tableA.x = tableB.y)

SELECT * FROM tableA WHERE EXISTS (SELECT 1 FROM tableB WHERE tableA.x = tableB.y)

They all should work and return the same result set. But does it matter if the inner SELECT selects all fields of tableB, one field, or just a constant?

Is there any best practice when all statements behave equal?

A: 

#3 Should be the best one, as you don´t need the returned data anyway. Bringing the fields will only add an extra overhead

DonOctavioDelFlores
I would hope that any modern database query optimizer would recognize that no data is being used from the subquery and treat all the variants identically.
Dave Costa
A: 

For Oracle, best practice is WHERE EXISTS (SELECT NULL FROM ...

Mitch Wheat
Select NULL seems counter intuitive, as NULL signifies the absence of a value. I'm sure the code works in practice, but it just looks odd.
Kibbee
It can be counter-intuitive for some people. I have always used this method, but I think it was based on a misconception I was taught that it required less processing than the other variant. In current version, this is not true (see Charles Bretana's answer and my comment).
Dave Costa
Null makes sense because you want nothing - no part of the row.
Would the downvoter please leave a comment. Thanks.
Mitch Wheat
A: 

The exists returns a boolean not actual data, that said best practice is to use #3

SQLMenace
Can you explain why you consider that to be best practice?
Dave Costa
+1  A: 

This is one of those questions that verges on initiating some kind of holy war.

There's a fairly good discussion about it here.

I think the answer is probably to use the third option, but the speed increase is so infinitesimal it's really not worth worrying about. It's easily the kind of query that SQL Server can optimise internally anyway, so you may find that all options are equivalent.

inferis
+3  A: 

In SQL Server at least,

The smallest amount of data that can be read from disk is a single "page" of disk space. As soon as the processor reads one record that satisfies the subquery predicates it can stop. The subquery is not executed as though it was standing on it's own, and then included in the outer query, it is executed as part of the complete query plan for the whole thing. So when used as a subquery, it really doesn't matter what is in the Select clause, nothing is returned" to the outer query anyway, except a boolean to indicate whether a single record was found or not...

All three use the exact same execution plan

I always use [Select * From ... ] as I think it reads better, by not implying that I want something in particular returned from the subquery.

EDIT: From dave costa comment... Oracle also uses the same execution plan for all three options

Charles Bretana
In Oracle as well, all three variations (plus SELECT NULL) use the exact same execution plan. Notably, even in the SELECT * plan there was no access to tableB, only to the index on the join column, so clearly the optimizer recognizes that it does not need the actual values for the SELECT *.
Dave Costa
@Dave, Thx I have edited to reflect
Charles Bretana
* = nothing? I don't get it. In fact * = everything, NULL = Nothing. If you want to "not [imply] that I want something in particular returned" Would SELECT NULL be much, much clearer.
@Mark, Inside of an Where Exists SubQuery, The contents of a select clause are irrelevant... So not only does * = nothing, it also = each thing, anything, and everything and/or something ... ie. it doesn't matter... The reason you have to put "something" there is that a Select clause requires it.
Charles Bretana
+6  A: 

Definitely #1. It "looks" scary, but realize the optimizer will do the right thing and is expressive of intent. Also ther is a slight typo bonus should one accidently think EXISTS but type IN. #2 is acceptable but not expressive. The third option stinks in my not so humble opinion. It's too close to saying "if 'no value' exists" for comfort.

In general it's important to not be scared to write code that mearly looks inefficient if it provides other benefits and does not actually affect performance.

That is, the optimizer will almost always execute your complicated join/select/grouping wizardry to save a simple EXISTS/subquery the same way.

After having given yourself kudos for cleverly rewriting that nasty OR out of a join you will eventually realize the optimizer still used the same crappy execution plan to resolve the much easier to understand query with embedded OR anyway.

The moral of the story is know your platforms optimizer. Try different things and see what is actually being done because the rampant knee jerks assumptions regarding 'decorative' query optimization are almost always incorrect and irrelevant from my experience.

Einstein
+1 because this is the only answer that gave a reason for preferring one option over the other instead of just saying "X is best practice".
Dave Costa
The poster didn't ask for reasons, just best practice!
Mitch Wheat
That's like saying "the question didn't ask me to justify my answer" in a philosphy exam... You give an answer and you want people to take note, you justify your answer.
Dems
+1  A: 

Execution Plan. Learn it, use it, love it

There is no possible way to guess, really.

Thuglife