views:

200

answers:

2

I have user login data with timestamps and what I would like to do is get a histogram of logins by year, but with the year starting at an arbitrary date. For example, I want the following sort of information:

1 May 2005 - 30 Apr 2006 | 525
1 May 2006 - 30 Apr 2007 | 673
1 May 2007 - 30 Apr 2008 | 892
1 May 2006 - 30 Apr 2009 | 1047

The labels in the first column are not important, but the date ranges are. I know I can break it down by strait years with:

SELECT YEAR([date]) AS [year], COUNT(*) AS cnt 
FROM logins
GROUP BY YEAR([date])
ORDER BY [year]

But that doesn't give me the data ranges I want. How can this be done?

+1  A: 

If you can find a way to define your date ranges in a separate table then select out a label and two columns of dates and join on that from your main query something like this depending on your tables.

Select Count(*) as NoLogons, DateRangeLabel
From logins a
inner join
(
Select
DateRangeLabel, StartDate, EndDate
From tblMyDates 
) b
on a.date between b.startdate and b.enddate
Group by DateRangeLabel
u07ch
+1  A: 
declare @baseDate datetime
set @baseDate = '1 May 2005'

SELECT
    datediff(year, @baseDate, [date]) AS YearBucket 
    ,COUNT(*) AS cnt 
FROM logins
GROUP BY datediff(year, @baseDate, [date])
ORDER BY datediff(year, @baseDate, [date])

EDIT - apologies, you are correct. Here is a fixed version (I should have used a test table to start with...)

create table logins (date datetime, foo int)
insert logins values ('1 may 2005', 1)
insert logins values ('1 apr 2006', 2)
insert logins values ('1 may 2006', 3)

declare @baseDate datetime
set @baseDate = '1 May 2005'

SELECT
    datediff(day, @baseDate, [date]) / 365 AS YearBucket 
    ,COUNT(*) AS cnt 
FROM logins
GROUP BY datediff(day, @baseDate, [date]) / 365
ORDER BY datediff(day, @baseDate, [date]) / 365

Change the datediff units if you want more granularity than days.

EDIT #2 - ok, here is a more robust solution that handles leap years :) EDIT #3 - Actually this doesn't handle leap years, instead it allows for variable intervals of time to be specified. Go with dateadd(year, 1, @baseDate) for the leap year safe approach.

declare @baseDate datetime, @interval datetime
--@interval is expressed as time above 0 time (1/1/1900)
select @baseDate = '1 May 2005', @interval = '1901'

declare @timeRanges table (beginIntervalInclusive datetime, endIntervalExclusive datetime)
declare @i int
set @i = 1
while @i <= 10
begin
    insert @timeRanges values(@baseDate, @baseDate + @interval)
    set @baseDate = @baseDate + @interval
    set @i = @i + 1
end

SELECT
    tr.beginIntervalInclusive,
    tr.endIntervalExclusive,
    COUNT(*) AS cnt 
FROM logins join @timeRanges as tr
    on logins.date >= tr.beginIntervalInclusive
        and logins.date < tr.endIntervalExclusive
GROUP BY  tr.beginIntervalInclusive, tr.endIntervalExclusive
ORDER BY  tr.beginIntervalInclusive
Gives the exact same results, just without the year in front.
cdeszaq
@cdeszaq - you are right, sorry about that. Query is fixed now, I should have used a test table to start with..
:) I had just gotten to that myself...but will it handle leap years correctly, or will that introduce off-by-1 errors?
cdeszaq
@cdeszaq - ok, here is a more robust solution :)
so close...still off by 1 for 1 jan 2005...end interval gets set to dec 31, rather than jan 1 of the next year for 2008
cdeszaq
Got it...DATEADD(year,1, @baseDate) is needed instead of @baseDate + @interval
cdeszaq
@cdeszaq - are you sure @baseDate + @interval isn't working? Eyeballing the contents of @timeRanges, it looks good to me. I'm not sure how 1/1/2005 fits in, my values define ranges for 5/1/2005 - 4/29/2015.
The interval was being incorrectly applied in the 2008 leap year, missing by 1 day, putting the ending interval 1 day short. I am guessing that the '+' operator on dates does not correctly handle leap years. The DATEADD function does seem to though.
cdeszaq
@cdeszaq - ah I see, difference of expectation. You want a well defined 'year', I was on wrong track thinking that a variable interval of time would be more useful. I think the + shouldn't handle leap years since I'm adding the interval of 1900-1901 (which is a non-leap year). I see dateadd() with YEAR is more what you need.