views:

154

answers:

1

I've got an Occurrences table that contains one row for each time a user took an action. A user take an action multiple times per day. It looks like this:

Date      Username
------    --------
1/1/9     User1
1/1/9     User1
1/1/9     User2
1/2/9     User1
1/2/9     User3
1/3/9     User1
1/3/9     User1
1/3/9     User1
1/3/9     User2
1/3/9     User3
1/4/9     User1
1/5/9     User1
1/6/9     User1
1/7/9     User1

For each day in the range, I'd like to show the count of people who have taken the action multiple times - let's say, between 2 and 5 times, between 6 and 10 times, and more than 10 times. However, I only consider repeating the action on a different day to count as another instance of that action. For example, if a user did the thing 3 times on the first day and then again any # of times on the next day, I see that user has having done the action 2 times and hence should be in the 2-to-5 times column.

The result set corresponding to the above sample data would be:

        #_of_people       #_of_people       #_of_people       
        who_did_action    who_did_action    who_did_action        
Date    2to5_times        6to10_times       more_than_10      Total
-----   --------------    --------------    --------------    -----
1/1/9   0                 0                 0                 0
1/2/9   1                 0                 0                 1
1/3/9   3                 0                 0                 3   
...    
1/7/9   0                 1                 0                 1        

Note that each row of the result is counting the # of repeat actions for that specific day only - not cumulative.

  • The 1/1/9 row is all zeros since it's the first day and all actions are considered to be the first.
  • The 1/2/9 row is 1, 0, 0, 1 because only User1 has repeated - it's User3's first time.
  • The 1/3/9 row is 3, 0, 0, 3 because User1 has repeated twice, User2 has repeated once, and User3 has also repeated once.
  • The 1/7/9 row is 0, 1, 0, 1 because User1 has repeated 6 times.
+1  A: 

Best I could come up with, untested:

DECLARE @username VARCHAR(40)
DECLARE @date DATETIME
DECLARE @counter INT
    SET @counter = 1

WITH occur AS (
     SELECT DISTINCT
            CONVERT(VARCHAR(10), o.date, 101) AS dt,
            o.username
       FROM OCCURRENCES o
   ORDER BY o.username, dt),
     occur_rank AS (
     SELECT x.dt,
            @username = x.username,
            @date = CAST(x.dt AS DATETIME),
            CASE WHEN @username = x.username AND @date + 1 = x.date THEN @counter = @counter + 1 ELSE @counter = 1 END AS rank
       FROM occur x
   ORDER BY x.username, x.dt)
  SELECT or.dt,
         SUM(CASE WHEN or.rank BETWEEN 2 AND 5 THEN 1 ELSE 0 END) AS 2_to_5,
         SUM(CASE WHEN or.rank BETWEEN 6 AND 10 THEN 1 ELSE 0 END) AS 6_to_10
    FROM occur_rank or
GROUP BY or.dt

If you can get the numbering to reset on a date gap, the rest is easy. But none of the SQL Server ranking functions support that. So that leaves manually incrementing a variable.

The first CTE, occur just returns a list of usernames associated with dates. The second CTE builds on it, adding the ranking for the final output to pivot on.

OMG Ponies
The convert is necessary to get rid of the time portion of the `DATE` column, because it is a DATETIME data type.
OMG Ponies
You read my mind. :) I'm trying this now. There should be a comma after the "as dt" and an "as" between the COUNT(*) and the 'num_occur' I believe, right?
Robert
@Robert: Fixed the missing comma, thx. `AS` isn't required for defining table aliases, but I should've been consistent.
OMG Ponies
Instead of using Case for two outcomes, use the IIF function. It's neater.
Tor Valamo
But "num_occur" is used here as a column alias, not a table alias I think?
Robert
@Tor: `CASE` is ANSI standard, supported on other databases.
OMG Ponies
@Robert: num_occur is a column alias
OMG Ponies
@OMG Ponies - Yeah, but CONVERT() isn't, so what's your point? ;)
Tor Valamo
@Tor Vaolamo: CONVERT is supported on Oracle and MySQL, besides SQL Server.
OMG Ponies
@Robert - Here's the results I'm getting from the query (only showing the 2-5 and 6-10 columns) versus what I'd expect from the data set in the question: 1/1/9 is 1, 0. 1/2/9 is 0, 0. 1/3/9 is 1, 0. 1/4/9 is 0, 0. 1/5/9 is 0, 0. 1/6/9 is 0, 0. 1/7/9 is 0, 0.
Robert
I just saw that IIF is not supported in sql server (though it is in access, which is why I thought it might work).
Tor Valamo
@Robert: Check that the user counts per day match what you expect.
OMG Ponies
@OMG - I think I see what it is. I'm trying to return not the # of occurrences on a given day, but rather the # of *repeat occurrences* on a given day, where multiple occurrences per day don't count. Said differently, 1/3/9 is User's 3rd day on the site, so this would add +1 to the 2-5 column for that day. 1/3/9 was also User2's 2nd day on the site (another +1 to the 2-5 column) and User3's 2nd day as well (another +1 to the 2-5 column = 3 for that day). As another example, 1/7/9 was User1's 7th time at the site, so that puts a +1 in the 6-10 column.
Robert
@Robert: Understood. I won't have time to update answer till later today.
OMG Ponies
@OMG - I'm going to go through what you just posted.
Robert