ansaurus

Question

Efficient way to check if a SQL query will return results

Answer 1

+8 A:

Do you have an index on id and date?

maybe you just want:

select top 1  1 from myTable where id=7 and rowInsertDate > '01/01/2009'

note: this would return 1 if data exists, or nothing otherwise.

another edit. This won't return a row with the value null if there is no data, but rather will not return any rows. More like null in its more figurative sense.

Nathan Feger 2009-11-10 15:34:25

This give a much bigger clue to the query optimizer and I would expect it to be much faster.

Ben S 2009-11-10 15:35:34

Clever =solution.

Nathan Taylor 2009-11-10 15:36:13

make sure to get the appropriate platform dependent limit clause.. top 1 or limit 1 etc..

Kevin 2009-11-10 15:39:14

I am pretty sure this won't return NULL if no record is matched, but no result at all, which is pretty different. This is why `EXISTS` actually exists.

Alex Bagnolini 2009-11-10 15:42:37

Ahh I see the query would actually take about 4 input parameters plus a date range so I simplified it for the question. Cheers dude nice idea.

Robert 2009-11-10 15:52:00

Answer 2

+2 A:

IF EXISTS should be more efficient, because it is optimised to stop as soon as it find the first row. This is how I would always do this kind of check, not using a COUNT().

For performance comparison, just ensure you are testing fairly by clearing down the data and execution plan caches (non-production db server only) before each test:

DBCC FREEPROCCACHE
DBCC DROPCLEANBUFFERS

AdaTheDev 2009-11-10 15:36:18

This depends on the DBMS's query optimizer. If his tests show that it takes 4 seconds with both IF EXISTS and COUNT, then IF EXISTS is likely not optimized.

Ben S 2009-11-10 15:37:56

For now the results are the same but i was querying more for the future. As the number of rows increased I wondered if the count performance would get slower. Someone suggested it would not since the tables are analyzed and performing counts remains quite quick anyway. I'm tending towards the IF EXISTS more and more.

Robert 2009-11-10 15:53:44

I would compare both, with clean cache, checking duration, cpu and reads via SQL Profiler and would expect EXISTS to not only perform better in SQL Server, but also scale better when data volumes grow. With small amounts of data, there will not be much of a difference, but scalability is important.

AdaTheDev 2009-11-10 15:55:28

IF EXISTS is optimised. There's some very good books by Solid Quality Learning (http://www.solidq.com/na/OurBooks.aspx), which I've read a number of and although I forget which one, one of them gave the best level of detail I've come across on the query optimisation process. IF EXISTS vs alternatives was a scenario I researched into some time ago

AdaTheDev 2009-11-10 15:58:00

Answer 3

+2 A:

If you don't need 376986 rows and just want to know if something exists then IF EXISTS makes a lot more sense. Also, another helpful bit is to ask for an indexed column (primary key) instead of * because you don't care about the actual data.

statenjason 2009-11-10 15:38:59

Thanks i'll be sure to only ask for 1 column in my results since i don't actually want the results.

Robert 2009-11-10 15:54:17

Answer 4

+4 A:

This is the fastest i could get in my projects:

SELECT CASE WHEN EXISTS (
  select top 1 1 
  from myTable 
  where id=7 
  and rowInsertDate BETWEEN '01/01/2009' AND GETDATE()
) THEN 1 ELSE 0 END AS AnyData

Alex Bagnolini 2009-11-10 15:39:21

This is the general premise I am going to use. Thanks

Robert 2009-11-10 15:54:52

Answer 5

A:

First off, you should try to dummy up a database containing as much data as you think you (or your successors) might have to deal with in two years. Then your tests will be a lot more productive.

IF EXISTS() will be faster, since the database engine only has to find a first matching record to your criteria. It will of course be faster still with proper indexes.

Another hint, don't use *, since you don't actually need to retrieve columns.

IF EXISTS(select 1 from myTable where id=7 and rowInsertDate BETWEEN '01/01/2009' AND GETDATE())

...should (from what I've read) work a bit faster.

Philip Kelley 2009-11-10 15:41:53

It doesn't matter if you use SELECT *...EXISTS is optimised for that. It makes no difference.

AdaTheDev 2009-11-10 15:53:23

Answer 6

A:

I would just write it this way :

IF EXISTS(
      SELECT 0 FROM myTable 
      WHERE id=7 and rowInsertDate BETWEEN '01/01/2009' AND GETDATE()
)
SELECT 1
ELSE
SELECT 0

That way you don't return any data just check for conditions. I find this query structure super fast.

Ender 2009-11-10 15:43:54

Answer 7

+1 A:

The final results will actually be a far more complex query, taking one to many parameters and the string built up and executed using sp_executesql

I think you, at least, need the full FROM, JOIN and WHERE syntax, otherwise your actual query may find nothiong (e.g. by adding an INNER JOIN that was not in the original IF EXISTS query and turns out to not be satisfied).

If you are going to that trouble you might want to get the PKs into some sort of "Batch ID Holding Table" so that you can just reference the PKs for the second "Presentation" part of your query.

What are you planning to do if you get 376,986 results? If you are going to show them to the user on screen, with some sort of paging, then having the results in a "Batch ID Holding Table" might assist with that (although, obviously, any additions / deletions etc. to the udnerlying data will muck up the paged display).

Alternatively, if you are going to be using paging just use TOP / LIMIT / SET ROWCOUNT to restrict the results to the first page full (make sure you have an ORDER BY so the sequence is repeatable), and then sort out what to do for Page 2 when the user presses the NEXT-PAGE button (we tackle that by the NEXT-PAGE button containing the PK of the last record displayed, in sort-order, so that the Next Page can resume from that point onwards).

The Query Optimiser will do different things depending on what the SELECT list is - so asking "IF EXISTS" followed by "SELECT Col1, COl2, ... FROM ..." may in effect mean that you run the complete query twice, differently, using different cached data and query plans, so overall that may be more of a strain on your server, and cause the users to wait longer, than just geting the first page / 100 rows etc.

SQL Server will cache the query plan for sp_ExecuteSQL, but make sure you parameterise the query so that the cached plan is resued where possible

Kristen 2009-11-10 15:47:52

Yup i'll definitly be paramaterising the generated query and passing my parameters into sp_executesql. I remember the first time I saw that I had to laugh. Now i'm quite ok with it :)

Robert 2009-11-10 15:57:30

Answer 8

A:

I think Alex Bagnolini's answer is correct. The system wouldn't let me comment on his answer (new acct). The only modification I'd make is to change the second 1 to id.

Sometimes reducing the list in the project section (that's the column list) allows the db engine to hit the index only, and not the table, thus faster. This depends on your DB engine, and index structure/size, of course. (all rowInsertDate dates should be < getDate(), so you can skip that comparison)

SELECT CASE WHEN EXISTS ( select top 1 id from myTable where id=7 and rowInsertDate > '01/01/2009' ) THEN 1 ELSE 0 END AS AnyData

MandoMando 2009-11-10 16:04:43

ansaurus

tags:

views:

answers:

Efficient way to check if a SQL query will return results

related questions