views:

43

answers:

4

I need to find all records created in a given year from a MySQL database. Is there any way that one of the following would be slower than the other?

WHERE create_date BETWEEN '2009-01-01 00:00:00' AND '2009-12-31 23:59:59'

or

WHERE YEAR(create_date) = '2009'
+3  A: 

I would expect the former to be quicker as it is sargable.

RedFilter
+3  A: 

This:

WHERE create_date BETWEEN '2009-01-01 00:00:00' AND '2009-12-31 23:59:59'

...works better because it doesn't alter the data in the create_date column. That means that if there is an index on the create_date, the index can be used--because the index is on the actual value as it exists in the column.

An index can't be used on YEAR(create_date), because it's only using a portion of the value (that requires extraction).

OMG Ponies
+1  A: 

Whenever you use a function against a column, it must perform the function on every row in order to see if it matches the constant. This prevents the use of an index.

The basic rule of thumb, then, is to avoid using functions on the left side of the comparison.

Sargable means that the DBMS can use an index. Use a column on the left side and a constant on the right side to allow the DBMS to utilize an index.

Even if you don't have an index on the create_date column, there is still overhead on the DBMS to run the YEAR() function for each row. So, no matter what, the first method is most likely faster.

Marcus Adams
A: 

Ideas:

  1. Examine the explain plans; if they are identical, query performance will probably be nearly the same.
  2. Test the performance on a large corpus of test data (which has most of its rows in years other than 2009) on a production-grade machine (ensure that the conditions are the same, e.g. cold / warm caches)

But I'd expect BETWEEN to win. Unless the optimiser is clever enough to do the optimisation for YEAR(), in which case would be the same.

ANOTHER IDEA:

I don't think you care.

If you have only a few records per year, then the query would be fast even if it did a full table scan, because even with (say) 100 years' data, there are so few records.

If you have a very large number of records per year (say 10^8) then the query would be very slow in any case, because returning that many records takes a long time.

You didn't say how many years' data you keep. I guess if it's an archaeological database, you might have a few thousand, in which case you might care if you have a massive load of data.

I find it extremely unlikely that your application will actually notice the difference between a "good" explain plan (using an index range scan) and a "bad" explain plan (full table scan) in this case.

MarkR