views:

61

answers:

2

I'm working on a recurrence application for events. I have a date range of say, January 1 2010 to December 31 2011. I want to return all of the 3rd Thursdays (arbitrary) of the each month, efficiently. I could do this pretty trivially in code, the caveat is that it must be done in a stored procedure. Ultimately I'd want something like:

CALL return_dates(event_id);

That event_id has a start_date of 1/1/2010 and end_date of 12/31/2011. Result set would be something like:

1/20/2010
2/14/2010
3/17/2010
4/16/2010
5/18/2010
etc. 

I'm just curious what the most efficient method of doing this would be, considering I might end up with a very large result set in my actual usage.

+1  A: 

One idea that comes to mind - you can create a table and store the dates you're interested in there.

Leonid
I'd much prefer to do it dynamically. I suppose that's a last resort possibility, but the number of records could be absurdly high.
James
Number of records depends on the range that you'd like to support. That shouldn't be a question of space if it's only few thousand years that you need to support. Efficiency shouldn't be of concern here. If the queries are ranges and range is a approximately year that storing dates in a table seems like a acceptable and efficient solution.
Leonid
Can't see anything wrong if you store a row per month for 2000 years in the table. It's 24000 rows per 2000 years, and you'd have efficient range queries.
Leonid
Right, but that's for one event. Now assume I have 3 million events.
James
I think I see what you're saying now. Still seems a little hokey. It would accomplish what I'm looking to do but I'd still like to find an elegant solution.
James
Well, in one case you have the results and you just need to retrieve them from Hard Drive, or cache, which is more likely if your table is regularly accessed, and caching is properly configured. 3 millions events don't say much to me, as it's relative to time in which they are requested. In which time interval do you expect 3 million events to occur?
Leonid
Essentially you need to answer question: what is more efficient - to store dates in the database and access them, or calculate them with a for loop everytime. That could also depend very much on how much calculations you need to do per query. I definitely go for DB table, because it's also less likely to make a mistake in calculations - you can see the actual data, and easily find an error, you can operate with data using queries.
Leonid
With a for loop: you can retrieve the first day of month and figure out what day of the week it is, and then do a simple calculation. I believe MySQL stored procedures + SQL should provide all these tools. Alternatively you could also go into detail and identify which year is leap and which is not, then you'd need to figure out day of the week only for the first month in a given range. DAYOFWEEK() is one function you could use.
Leonid
Fair enough, seems like I'm going to have to brute force it one way or another.
James
A: 

Ok, I haven't tested it, but I think the most efficient way of doing it is via a tally table which is a useful thing to have in the db anyway:

IF EXISTS (SELECT * FROM sys.objects
WHERE object_id = OBJECT_ID(N'[dbo].[num_seq]') AND type in (N'U'))
DROP TABLE [dbo].[num_seq];

SELECT TOP 100000 IDENTITY(int,1,1) AS n
INTO num_seq
FROM MASTER..spt_values a, MASTER..spt_values b;

CREATE UNIQUE CLUSTERED INDEX idx_1 ON num_seq(n);

You can then use this to build up the date range between the two dates. It's fast because it just uses the index (in fact often faster than a loop, so I'm led to believe)

create procedure getDates
    @eventId int
AS
begin

declare @startdate datetime
declare @enddate datetime

--- get the start and end date, plus the start of the month with the start date in
select @startdate=startdate, 
       @enddate=enddate
       from events where eventId=@eventId

  select
         @startdate+n AS date,
       from
         dbo.num_seq tally
       where
        tally.n<datediff(@monthstart, @enddate) and
        Datepart(dd,@startdate+n) between 15 and 21 and
        Datepart(dw, @startdate+n) = '<day>'

Aside from getting the start and end dates, the third x id each month must be between the 15th and the 21st inclusive. The day names in that range must be unique, so we can locate it straight away.

If you wanted the second dayname, just modify the range appropriately or use a parameter to calculate it.

It constucts a date table using the startdate, and then adding days on (via the list of numbers in the tally table) until it reaches the end date.

Hope it helps!

mr_miles