views:

740

answers:

6

I have the table of following structure:

UserID   StartedOn          EndedOn
1        2009-7-12T14:01    2009-7-12T15:01 
2        2009-7-12T14:30    2009-7-12T14:45
3        2009-7-12T14:47    2009-7-12T15:30
4        2009-7-12T13:01    2009-7-12T17:01
5        2009-7-12T14:15    2009-7-12T18:01
6        2009-7-12T11:01    2009-7-12T19:01
1        2009-7-12T16:07    2009-7-12T19:01

I need to find the maximal number of concurrent users that were on line. In the above table the result would be 5 because users set1={1,2,4,5,6} and set2={1,3,4,5,6} were online in the same period.

Do you have an idea how one could calculate this using T-SQL only?

A: 

you do a selfjoin on that table

codymanix
A: 

I did the work using integers rather than datetime fields, but I believe the following sql snippet gets you what you want.

Basically, I compared the start and end date of each user against each other using a self-join. If User A started before or at the same time as User B AND User B started before or at the same time as User A ended, they are running concurrently. Thus, I found the user with the max number of concurrent users (and added 1 for themselves since I excluded them in the self-join.)

I noticed you have multiple rows for each user. Please note the sql below assumes the same user can't be running multiple instances at once (concurrently.) If this assumption doesn't hold true, I'm hoping you have an additional column which is unique per row. Use this column rather than UserId throughout the sql routine.

I've gotten you really close. I hope this helps. Best of luck.

DECLARE @Table TABLE 
(
  UserId int, 
  StartedOn int,
  EndedOn int
)

Insert Into @Table
Select 1, 1, 3
union
Select 2, 2, 4
union
Select 3, 3, 5
union
Select 4, 4, 6
union
Select 5, 7, 8
union
Select 6, 9, 10
union
Select 7, 9, 11
union
Select 8, 9, 12
union
Select 9, 10, 12
union
Select 10, 10, 13

--Select * from @Table

Select 
    A.UserId, 
    Count(B.UserId) + 1 as 'Concurrent Users'
FROM @Table A, @Table B
WHERE A.StartedOn <= B.StartedOn
AND B.StartedOn <= A.EndedOn
AND A.UserId != B.UserId
Group By A.UserId
Order By Count(B.UserId) Desc
Ben Griswold
-1 This does not count the number of *simulateously* concurrent users, but the number of users any user is concurrent with. Eg. an user that holds a long session may see 10 different users come and by, but the max number of concurrent users was only 2. Your query will return 11 none the less.
Remus Rusanu
I concur with Remus Rusanu's assessment. I didn't see this before posting, but he's absolutely correct. My solution will not work per his comments. Nice catch. +1 on your comment.
Ben Griswold
A: 

A naive approach:
You can test if another user b is currently logged in when user a logs in with

a.StartedOn BETWEEN b.StartedOn AND b.EndedOn

And someone has to be the "final logon" to the set of "the most concurrent users".
If you now go through all records (as a) and check how many other users (b) where logged in at the time and then order the list (desc) the first result is the maximum number of concurrent users.

SELECT
  a.id, a.UserId, a.StartedOn, a.EndedOn,  
  (  
    SELECT    
      Count(*)      
    FROM    
      logons as b      
    WHERE    
      a.StartedOn BETWEEN b.StartedOn AND b.EndedOn            
  ) as c
FROM
  logons as a 
ORDER BY
  c desc

And now read http://stackoverflow.com/questions/621884/database-development-mistakes-made-by-appdevelopers to see how inefficient (or even wrong) this is ;-)
e.g. you have a large temporary table that the order by operates on without any index to help the sql server.

(and btw: I tested this with MySQL because I don't have a sql server at hand right now)

VolkerK
+6  A: 

Clearly the number of concurrent users only changes when a user either starts or ends a period, so it is enough to determine the number of concurrent users during starts and ends. So, reusing test data provided by Remus (thank you Remus):

DECLARE @Table TABLE 
(
  UserId int, 
  StartedOn datetime,
  EndedOn datetime
);

insert into @table (UserId, startedOn, EndedOn)
select 1, '2009-7-12 14:01', '2009-7-12 15:01'
union all select 2, '2009-7-12 14:30', '2009-7-12 14:45'
union all select 3, '2009-7-12 14:47', '2009-7-12 15:30'
union all select 4, '2009-7-12 13:01', '2009-7-12 17:01'
union all select 5, '2009-7-12 14:15', '2009-7-12 18:01'
union all select 6, '2009-7-12 11:01', '2009-7-12 19:01'
union all select 1, '2009-7-12 16:07', '2009-7-12 19:01';

SELECT MAX(ConcurrentUsers) FROM(
SELECT COUNT(*) AS ConcurrentUsers FROM @table AS Sessions 
JOIN 
(SELECT DISTINCT StartedOn AS ChangeTime FROM @table
) AS ChangeTimes
ON ChangeTime >= StartedOn AND ChangeTime < EndedOn 
GROUP BY ChangeTime
) AS ConcurrencyAtChangeTimes
-------
5

BTW using DISTINCT per se is not a mistake - only abusing DISTINCT is. DISTINCT is just a tool, using it in this context is perfectly correct.

AlexKuznetsov
MAX() on a derived table, DISTINCT on a sub-query, GROUP BY. No offense but isn't that exactly what http://stackoverflow.com/questions/621884/database-development-mistakes-made-by-appdevelopers is about?
chendral
Using GROUP BY is what - a common mistake? Are you serious?
AlexKuznetsov
The point is that his query is highly inefficient.
orlandu63
Do you have any benchmarks to prove it? How do you know that it is less efficient that other alternatives?
AlexKuznetsov
A: 

I tried AlexKuznetsov's solution but the result was 49 :(

My solution:

/* Create temporary table and set all dates into 1 column,
so we can sort by this one column */
DECLARE @tmp table (
    Dates smalldatetime,
    IsStartedDate bit )

INSERT INTO @tmp
    SELECT StartedOn, 1 FROM stats
    UNION ALL
    SELECT EndedOn, 0 FROM stats

DECLARE @currentlogins int, @highestlogins int, @IsStartedDate bit;
SET @currentlogins = 0;
SET @highestlogins = 0;

DECLARE tmp_cursor CURSOR FOR 
SELECT IsStartedDate FROM @tmp
ORDER BY Dates ASC

OPEN tmp_cursor

/* Step through every row, if it's a starteddate increment @currentlogins else decrement it
When @currentlogins is higher than @highestlogins set @highestlogins to the new highest value */
FETCH NEXT FROM tmp_cursor 
INTO @IsStartedDate

WHILE @@FETCH_STATUS = 0
BEGIN
    IF (@IsStartedDate = 1)
    BEGIN
     SET @currentlogins = @currentlogins + 1;
     IF (@currentlogins > @highestlogins)
      SET @highestlogins = @currentlogins;
    END
    ELSE
     SET @currentlogins = @currentlogins - 1;

    FETCH NEXT FROM tmp_cursor 
    INTO @IsStartedDate
END

CLOSE tmp_cursor
DEALLOCATE tmp_cursor

SELECT @highestlogins AS HighestLogins
ZippyV
@ZippyV, yes, I fixed my query as soon as I had test data. It returns 5 now.
AlexKuznetsov
A: 

You can order all events on date order and compute a running aggregate of current users logged in:

DECLARE @Table TABLE 
(
  UserId int, 
  StartedOn datetime,
  EndedOn datetime
);

insert into @table (UserId, startedOn, EndedOn)
select 1, '2009-7-12 14:01', '2009-7-12 15:01'
union all select 2, '2009-7-12 14:30', '2009-7-12 14:45'
union all select 3, '2009-7-12 14:47', '2009-7-12 15:30'
union all select 4, '2009-7-12 13:01', '2009-7-12 17:01'
union all select 5, '2009-7-12 14:15', '2009-7-12 18:01'
union all select 6, '2009-7-12 11:01', '2009-7-12 19:01'
union all select 1, '2009-7-12 16:07', '2009-7-12 19:01';

with cte_all_events as (
select StartedOn as Date
    , +1 as Users
    from @Table
union all 
select EndedOn as Date
    , -1 as Users
    from @Table),
cte_ordered_events as (
select Date
    , Users
    , row_number() over (order by Date asc) as EventId
    from cte_all_events)
, cte_agg_users as (
  select Date
    , Users
    , EventId
    , (select sum(Users) 
     from cte_ordered_events agg
     where agg.EventId <= e.EventId) as AggUsers
    from cte_ordered_events e)
select * from cte_agg_users


2009-07-12 11:01:00.000 1 1 1
2009-07-12 13:01:00.000 1 2 2
2009-07-12 14:01:00.000 1 3 3
2009-07-12 14:15:00.000 1 4 4
2009-07-12 14:30:00.000 1 5 5
2009-07-12 14:45:00.000 -1 6 4
2009-07-12 14:47:00.000 1 7 5
2009-07-12 15:01:00.000 -1 8 4
2009-07-12 15:30:00.000 -1 9 3
2009-07-12 16:07:00.000 1 10 4
2009-07-12 17:01:00.000 -1 11 3
2009-07-12 18:01:00.000 -1 12 2
2009-07-12 19:01:00.000 -1 13 1
2009-07-12 19:01:00.000 -1 14 0

Once you have this in place, finding the number of maximum concurrent sessions is trivial. As you see you have two moments when you had 5 users, at 14:30 (when user 2 logged in) and at 14:47 (when user 3 logged in). Just replace the last query that selects from the CTE to get the actual max:

select top(1) AggUsers 
    from cte_agg_users
    order by AggUsers desc

This solution uses CTEs so it will only work on SQL 2k5, if you're still on SQL 2000 you'll have to rewrite it using derived tables instead of CTEs.

Remus Rusanu
Remus, I borrowed your test data - thanks!
AlexKuznetsov