ansaurus

Question

Performance of SQL "EXISTS" usage variants

Answer 1

A:

#3 Should be the best one, as you don´t need the returned data anyway. Bringing the fields will only add an extra overhead

DonOctavioDelFlores 2009-01-08 13:38:34

I would hope that any modern database query optimizer would recognize that no data is being used from the subquery and treat all the variants identically.

Dave Costa 2009-01-08 14:54:33

Answer 2

A:

For Oracle, best practice is WHERE EXISTS (SELECT NULL FROM ...

Mitch Wheat 2009-01-08 13:44:15

Select NULL seems counter intuitive, as NULL signifies the absence of a value. I'm sure the code works in practice, but it just looks odd.

Kibbee 2009-01-08 14:11:09

It can be counter-intuitive for some people. I have always used this method, but I think it was based on a misconception I was taught that it required less processing than the other variant. In current version, this is not true (see Charles Bretana's answer and my comment).

Dave Costa 2009-01-08 15:20:57

Null makes sense because you want nothing - no part of the row.

2009-01-08 19:53:29

Would the downvoter please leave a comment. Thanks.

Mitch Wheat 2009-11-18 02:33:08

Answer 3

A:

The exists returns a boolean not actual data, that said best practice is to use #3

SQLMenace 2009-01-08 13:44:35

Can you explain why you consider that to be best practice?

Dave Costa 2009-01-08 14:55:05

Answer 4

+1 A:

This is one of those questions that verges on initiating some kind of holy war.

There's a fairly good discussion about it here.

I think the answer is probably to use the third option, but the speed increase is so infinitesimal it's really not worth worrying about. It's easily the kind of query that SQL Server can optimise internally anyway, so you may find that all options are equivalent.

inferis 2009-01-08 13:53:52

Answer 5

+3 A:

In SQL Server at least,

The smallest amount of data that can be read from disk is a single "page" of disk space. As soon as the processor reads one record that satisfies the subquery predicates it can stop. The subquery is not executed as though it was standing on it's own, and then included in the outer query, it is executed as part of the complete query plan for the whole thing. So when used as a subquery, it really doesn't matter what is in the Select clause, nothing is returned" to the outer query anyway, except a boolean to indicate whether a single record was found or not...

All three use the exact same execution plan

I always use [Select * From ... ] as I think it reads better, by not implying that I want something in particular returned from the subquery.

EDIT: From dave costa comment... Oracle also uses the same execution plan for all three options

Charles Bretana 2009-01-08 14:03:47

In Oracle as well, all three variations (plus SELECT NULL) use the exact same execution plan. Notably, even in the SELECT * plan there was no access to tableB, only to the index on the join column, so clearly the optimizer recognizes that it does not need the actual values for the SELECT *.

Dave Costa 2009-01-08 15:16:23

@Dave, Thx I have edited to reflect

Charles Bretana 2009-01-08 15:20:40

* = nothing? I don't get it. In fact * = everything, NULL = Nothing. If you want to "not [imply] that I want something in particular returned" Would SELECT NULL be much, much clearer.

2009-01-08 19:30:05

@Mark, Inside of an Where Exists SubQuery, The contents of a select clause are irrelevant... So not only does * = nothing, it also = each thing, anything, and everything and/or something ... ie. it doesn't matter... The reason you have to put "something" there is that a Select clause requires it.

Charles Bretana 2009-01-08 19:40:36

Answer 6

+6 A:

Definitely #1. It "looks" scary, but realize the optimizer will do the right thing and is expressive of intent. Also ther is a slight typo bonus should one accidently think EXISTS but type IN. #2 is acceptable but not expressive. The third option stinks in my not so humble opinion. It's too close to saying "if 'no value' exists" for comfort.

In general it's important to not be scared to write code that mearly looks inefficient if it provides other benefits and does not actually affect performance.

That is, the optimizer will almost always execute your complicated join/select/grouping wizardry to save a simple EXISTS/subquery the same way.

After having given yourself kudos for cleverly rewriting that nasty OR out of a join you will eventually realize the optimizer still used the same crappy execution plan to resolve the much easier to understand query with embedded OR anyway.

The moral of the story is know your platforms optimizer. Try different things and see what is actually being done because the rampant knee jerks assumptions regarding 'decorative' query optimization are almost always incorrect and irrelevant from my experience.

Einstein 2009-01-08 14:46:14

+1 because this is the only answer that gave a reason for preferring one option over the other instead of just saying "X is best practice".

Dave Costa 2009-01-08 15:24:17

The poster didn't ask for reasons, just best practice!

Mitch Wheat 2009-01-08 23:46:31

That's like saying "the question didn't ask me to justify my answer" in a philosphy exam... You give an answer and you want people to take note, you justify your answer.

Dems 2009-08-23 17:15:02

Answer 7

+1 A:

Execution Plan. Learn it, use it, love it

There is no possible way to guess, really.

Thuglife 2009-01-08 14:52:16

ansaurus

tags:

views:

answers:

Performance of SQL "EXISTS" usage variants

related questions