views:

2920

answers:

7

I'm trying to find out the most efficient (best performance) way to check date field for current date. Currently we are using:

SELECT     COUNT(Job) AS Jobs
FROM         dbo.Job
WHERE     (Received BETWEEN DATEADD(d, DATEDIFF(d, 0, GETDATE()), 0)
                        AND DATEADD(d, DATEDIFF(d, 0, GETDATE()), 1))
A: 

that's pretty much the best way to do it. you could put the DATEADD(d, DATEDIFF(d, 0, GETDATE()), 0) and DATEADD(d, DATEDIFF(d, 0, GETDATE()), 1) into variables and use those instead but i don't think that this will improve performance.

Mladen Prajdic
+4  A: 

If you just want to find all the records where the Received Date is today, and there are records with future Received dates, then what you're doing is (very very slightly) wrong... Because the Between operatior allows values that are equal to the ending boundary, so you could get records with Received date = to midnight tomorrow...

If there is no need to use an index on Received, then all you need to do is is check that the date diff with the current datetime is 0...

Where DateDiff(day, received, getdate()) = 0

This predicate is of course not SARGable so it cannot use an index... If this is an issue for this query, then, (assuming you cannot have Received dates in the future??), I would use this instead...

Where Received >= DateAdd(day, DateDiff(Day, 0, getDate()), 0)

If Received dates can be in the future, then you are probably as close to the most efficient as you can be... (Except change the Between to a >= AND < )

Charles Bretana
Charles, even without the index, `DateDiff(day, received, getdate())` is not best because it forces a calculation on every row in the table, using more CPU for no reason.
Emtucifor
@Emtucifor, true, but compared with Disk I/O reads, cpu cycles are so insignificant as to be irelevant. We're talking about three to four orders of magnitude different here.
Charles Bretana
That's true, Charles. Thanks for putting my nitpicking in perspective. :) I do think it's best to recommend the latter where possible because when there IS an index, that will seriously affect I/O.
Emtucifor
@Emtucifor, you are correct again.. and of course, best of all, when you have the discretion to do so, is to put the appropriate index in place, and design the queries to use them.
Charles Bretana
+1  A: 
WHERE
  DateDiff(d, Received, GETDATE()) = 0
Tomalak
I wouldn't do it that way, as it is not SARG-able.
Mitch Wheat
@Mitch Wheat: As long as there is no column with the date part only AND an index on it, nothing you can do will be SARGable anyway.
Tomalak
Tomalak, No - his initial solution is SARGable... Where Received >= {Midnight This morning} And Received < {Midnight Tonight} is SARGable
Charles Bretana
Good points about SARGable. What would be the optimum solution then?
kristof
if you're going to do it this way, then there's no point to separately calculating DateDiff((d, 0 ,getdate())... Just do single datediff - calculate DateDiff beterrn Received and getdate() - must be zero for all datetimes in today Where DateDiff(d, Received, getDate()) = 0
Charles Bretana
You're right, I've changed my code accordingly.
Tomalak
The database may not able to use the index. (as MSSQL 2000/2005)
Dennis Cheung
What the heck is "SARG-able"?
James Curran
That means that a search condition can be satisfied by using an index (http://en.wikipedia.org/wiki/Sargable). Calculating the value that is being filtered on makes an operation non-SARGable, because calculated values are in no index. This forces the server to look at each row individually (=slow).
Tomalak
aha.. I figured it was something like that (I'd wikipedia'd "SARG" but got nothing)
James Curran
This method is not good. Even if there's no index on the ReceivedDate column, why make the database engine perform the conversion on every single row instead of just once? Do the calculation and use the date range syntax.
Emtucifor
A: 

I'm not sure how you're defining "best" but that will work fine.

However, if this query is something you're going to run repeatedly you should get rid of the get_date() function and just stick a literal date value in there via whatever programming language you're running this in. Despite their output changing only once every 24 hours, get_date(), current_date(), etc. are non-deterministic functions, which means that your RDMS will probably invalidate the query as a candidate for storing in its query cache if it has one.

ʞɔıu
A: 

How 'bout

 WHERE
      DATEDIFF(d, Received, GETDATE()) = 0
James Curran
This is not the best way. See Marc Gravell's post for the best way.
Emtucifor
A: 

I would normally use the solution suggested by Tomalak, but if you are really desperate for performance the best option could be to add an extra indexed field ReceivedDataPartOnly - which would store data without the time part and then use the query

declare @today as datetime
set @today = datediff(d, 0, getdate())

select     
    count(job) as jobs
from         
    dbo.job
where     
    received_DatePartOnly = @today
kristof
The solution suggested by Tomalak is far from the best way.
Emtucifor
+1  A: 

If you want performance, you want a direct hit on the index, without any CPU etc per row; as such, I would calculate the range first, and then use a simple WHERE query. I don't know what db you are using, but in SQL Server, the following works:

// ... where @When is the date-and-time we have (perhaps from GETDATE())
DECLARE @DayStart datetime, @DayEnd datetime
SET @DayStart = CAST(FLOOR(CAST(@When as float)) as datetime) -- get day only
SET @DayEnd = DATEADD(d, 1, @DayStart)

SELECT     COUNT(Job) AS Jobs
FROM         dbo.Job
WHERE     (Received >= @DayStart AND Received < @DayEnd)
Marc Gravell
@Marc, I am not sure what you mean by a "direct" hit on the index ? if you are simply talking about when there is a calculation on the "other" side of a predicate operator, instead of pre-calculating it before executing the index, then either A) the calculated value is based on some other column in the table and is not the same for generated row, so it has to be in the sql, or B) if it is the same value for every generated row, the query processor will pre-calculate it anyway, so it WILL only be calculated once, no matter how many rows the query produces.
Charles Bretana