views:

547

answers:

8

I'm writing a program that sends an email out at a client's specific local time. I have a .NET method that takes a timezone & time and destination timezone and returns the time in that timezone. So my method is to select every distinct timezone in the database, check if it is the correct time using the method, then select every client out of the database with that timezone(s).

The query will look like one of these. Keep in mind the order of the result set does not matter, so a union would be fine. Which runs faster, or do they really do the same thing?

SELECT email FROM tClient WHERE timezoneID in (1, 4, 9)

or

SELECT email FROM tClient WHERE timezoneID = 1
    UNION ALL SELECT email FROM tClient WHERE timezoneID = 4
    UNION ALL SELECT email FROM tCLIENT WHERE timezoneID = 9

Edit: timezoneID is a foreign key to tTimezone, a table with primary key timezoneID and varchar(20) field timezoneName. Also, I went with WHERE IN since I didn't feel like opening up the analyzer.

Edit 2: Query processes 200k rows in under 100 ms, so at this point I'm done.

+1  A: 

My first guess would be that

SELECT email FROM tClient WHERE timezoneID in (1, 4, 9)
will be faster as it requires only single scan of the table to find the results, but I suggest checking the execution plan for both queries.

Michał Piaskowski
+2  A: 

For most database related performance questions, the real answer is to run it and analyze what the DB does for your dataset. Run an explain plan or trace to see if your query is hitting the proper indexes or create indexes if necessary.

I would likely go with the first using the IN clause since that carries the most semantics of what you want. The timezoneID seems like a primary key on some timezone table, so it should be a foreign key on email and indexed. Depending on the DB optimizer, I would think it should do an index scan on the foreign key index.

David Schlosnagle
+2  A: 

I do not have MS SQL Query Analyzer at hand to actually check my hypothesis, but think that WHERE IN variant would be faster because with UNION server will have to do 3 table scans whereas with WHERE IN will need only one. If you have Query Analyzer check execution plans for both queries.

On the Internet you may often encounter suggestions to avoid using WHERE IN, but that refers to cases where subqueries a used. So this case is out of scope of this recommendation and additionally is easier for reading and understanding.

Dima Malenko
A: 

Thank you, all very good answers, I will try the analyzer when I'm done writing the program and post some results.

Shawn Simon
A: 

Some DBMS's Query Optimizers modify your query to make it more efficient, so depending on the DBMS your using, you probably shouldn't care.

ANaimi
+1  A: 

I think that there are several very important information missing in the question. First of all, it is of great importance weather timezoneID is indexed or not, is it part of the primary key etc. I would advice everyone to have a look at the analyzer, but in my experience the WHERE clause should be faster, especially with an index. The logic is something like, there is an additional overhead in the union query, checking types, column numbers in each etc.

+1  A: 

In the book "SQL Performance Tuning", the authors found that the UNION queries were slower in all 7 DBMS' that they tested (SQL Server 2000, Sybase ASE 12.5, Oracle 9i, DB2, etc.): http://books.google.com/books?id=3H9CC54qYeEC&pg=PA32&vq=UNION&dq=sql+performance+tuning&source=gbs_search_s&sig=ACfU3U18uYZWYVHxr2I3uUj8kmPz9RpmiA#PPA33,M1

The later DBMS' may have optimized that difference away, but it's doubtful. Also, the UNION method is much longer and more difficult to maintain (what if you want a third?) vs. the IN.

Unless you have good reason to use UNION, stick with the OR/IN method.

Matt Rogish
+4  A: 

Hey! These queries are not equivalent.

Results will be same only if assuming that one email belongs only to the one time zone. Of course it does however SQL engine doesn't know that and tries to remove duplicities. So the first query should be faster.

Always use UNION ALL, unless you know why you want to use UNION.

If you are not sure what is difference see this SO question.

Note: that yell belongs to previous version of question.

Jakub Šturc