views:

643

answers:

3

Jeff recently asked this question and got some great answers.

Jeff's problem revolved around finding the users that have had (n) consecutive days where they have logged into a system. Using a database table structure as follows:

Id      UserId   CreationDate
------  ------   ------------
750997      12   2009-07-07 18:42:20.723
750998      15   2009-07-07 18:42:20.927
751000      19   2009-07-07 18:42:22.283

Read the original question first for clarity and then...

I was intrigued by the problem of determining how many distinct (n)-day periods for a user.

Could one craft a speedy SQL query that could return a list of users and the number of distinct (n)-day periods they have?

EDIT: as per a comment below If someone has 2 consecutive days, then a gap, then 4 consecutive days, then a gap, then 8 consecutive days. It would be 3 "distinct 4 day periods". The 8 day period should count as two back-to-back 4 day periods.

A: 

This works quite nicely with the test data I have.

DECLARE @days int
SET @days = 30

SELECT DISTINCT l.UserId, (datediff(d,l.CreationDate, -- Get first date in contiguous range
(
    SELECT min(a.CreationDate ) as CreationDate
    FROM UserHistory a
     LEFT OUTER JOIN UserHistory b 
      ON a.CreationDate = dateadd(day, -1, b.CreationDate ) AND
      a.UserId = b.UserId
    WHERE b.CreationDate IS NULL AND
     a.CreationDate >= l.CreationDate AND
     a.UserId = l.UserId
) )+1)/@days as cnt
INTO #cnttmp
FROM UserHistory l
    LEFT OUTER JOIN UserHistory r 
     ON r.CreationDate = dateadd(day, -1, l.CreationDate ) AND
     r.UserId = l.UserId
WHERE r.CreationDate IS NULL
ORDER BY l.UserId

SELECT UserId, sum(cnt)
FROM #cnttmp
GROUP BY UserId
HAVING sum(cnt) > 0
Ron Tuffin
+1  A: 

My answer appears to have not appeared...

I'll try again...

Rob Farley's answer to the original question has the handy benefit of including the number of consecutive days.

with numberedrows as
(
        select row_number() over (partition by UserID order by CreationDate) - cast(CreationDate-0.5 as int) as TheOffset, CreationDate, UserID
        from tablename
)
select min(CreationDate), max(CreationDate), count(*) as NumConsecutiveDays, UserID
from numberedrows
group by UserID, TheOffset

Using integer division, simply dividing the consecutive number of days gives the number of "distinct (n)-day periods" covered by the whole consecutive period...
- 2 / 4 = 0
- 4 / 4 = 1
- 8 / 4 = 2
- 9 / 4 = 2
- etc, etc

So here is my take on Rob's answer for your needs...
(I really LOVE Rob's answer, go read the explanation, it's inspired thinking!)

with
    numberedrows (
        UserID,
        TheOffset
    )
as
(
    select
        UserID,
        row_number() over (partition by UserID order by CreationDate)
            - DATEDIFF(DAY, 0, CreationDate) as TheOffset
    from
        tablename
),
    ConsecutiveCounts(
        UserID,
        ConsecutiveDays
    )
as
(
    select
        UserID,
        count(*) as ConsecutiveDays
    from
        numberedrows
    group by
        UserID,
        TheOffset
)
select
    UserID,
    SUM(ConsecutiveDays / @period_length) AS distinct_n_day_periods
from
    ConsecutiveCounts
group by
    UserID

The only real difference is that I take Rob's results and then run it through another GROUP BY...

Dems
+1  A: 

So - I'm going to start with my query from the last question, which listed each run of consecutive days. Then I'm going to group that by userid and NumConsecutiveDays, to count how many runs of days there are for those users.

with numberedrows as
(
        select row_number() over (partition by UserID order by CreationDate) - cast(CreationDate-0.5 as int) as TheOffset, CreationDate, UserID
        from tablename
)
,
runsOfDay as
(
select min(CreationDate), max(CreationDate), count(*) as NumConsecutiveDays, UserID
from numberedrows
group by UserID, TheOffset
)
select UserID, NumConsecutiveDays, count(*) as NumOfRuns
from runsOfDays
group by UserID, NumConsecutiveDays
;

And of course, if you want to filter this to only consider runs of a certain length, then put "where NumConsecutiveDays >= @days" in the last query.

Now, if you want to count a run of 16 days as three 5-day runs, then each run will count as NumConsecutiveDays / @runlength of these (which will round down for each integer). So now instead of just counting how many there are of each, use SUM instead. You could use the query above and use SUM(NumOfRuns * NumConsecutiveDays / @runlength), but if you understand the logic, then the query below is a bit easier.

with numberedrows as
(
        select row_number() over (partition by UserID order by CreationDate) - cast(CreationDate-0.5 as int) as TheOffset, CreationDate, UserID
        from tablename
)
,
runsOfDay as
(
select min(CreationDate), max(CreationDate), count(*) as NumConsecutiveDays, UserID
from numberedrows
group by UserID, TheOffset
)
select UserID, sum(NumConsecutiveDays / @runlength) as NumOfRuns
from runsOfDays
where NumConsecutiveDays >= @runlength
group by UserID
;

Hope this helps,

Rob

Rob Farley