tags:

views:

99

answers:

3

Hello all.

I'm writing a statistics based application off a SQLite database. There is a table which records when users Login and Logout (SessionStart, SessionEnd DateTimes).

What i'm looking for is a query that can show what hours user have been logged in, in sort of a line graph way- so between the hours of 12:00 and 1:00AM there were 60 users logged in (at any point), between the hours of 1:00 and 2:00AM there were 54 users logged in, etc...

And I want to be able to run a SUM of this, which is why I can't bring the records into .NET and iterate through them that way.

I've come up with a rather primative approach, a subquery for each hour of the day, however this approach has proved to be slow and slow. I need to be able to calculate this for a couple hundred thousand records in a split second..

  SELECT
        case
        when (strftime('%s',datetime(date(sessionstart), '+0 hours')) > strftime('%s',sessionstart)
        AND strftime('%s',datetime(date(sessionstart), '+0 hours')) < strftime('%s',sessionend))
        OR (strftime('%s',datetime(date(sessionstart), '+1 hours')) > strftime('%s',sessionstart)
        AND strftime('%s',datetime(date(sessionstart), '+1 hours')) < strftime('%s',sessionend))
        OR (strftime('%s',datetime(date(sessionstart), '+0 hours')) < strftime('%s',sessionstart)
        AND strftime('%s',datetime(date(sessionstart), '+1 hours')) > strftime('%s',sessionend))
        then 1 else 0 end as hour_zero,
... hour_one, 
... hour_two, 
........ hour_twentythree
FROM UserSession

I'm wondering what better way to determine if two DateTimes have been seen durring a particular hour (best case scenario, how many times it has crossed an hour if it was logged in multiple days, but not necessary)?

The only other idea I had is have a "hour" table specific to this, and just tally up the hours the user has been seen at runtime, but I feel like this is more of a hack than the previous SQL.

Any help would be greatly appreciated!

+1  A: 

I'd go with your "hack" idea, but I don't consider it a hack, really - after the hour's over, the value won't ever change, so why not calculate it once and be done with it? Rollup tables are perfectly valid for this and will yield consistent query times regardless of how many users you've been tracking.

You could calculate these every hour or alternatively, you could increment each hour's counter at login/logout events and avoid a scheduled task.

jasondoucette
+2  A: 

Played around a bit on Sybase (T-SQL dialect) and came up with this query.

SELECT
    StartHour AS Hour, COUNT(*) AS SessionCount
FROM
    (SELECT
        CONVERT(DATETIME, '2001-01-01 ' + Hour + ':00:00') as StartHour,
        DATEADD(HH, 1, CONVERT(DATETIME, '2001-01-01 ' + Hour + ':00:00')) as EndHour
    FROM
        (SELECT '00' AS Hour UNION ALL SELECT '01' AS Hour UNION ALL
        SELECT '02' AS Hour UNION ALL SELECT '03' AS Hour UNION ALL
        SELECT '04' AS Hour UNION ALL SELECT '05' AS Hour UNION ALL
        SELECT '06' AS Hour UNION ALL SELECT '07' AS Hour UNION ALL
        SELECT '08' AS Hour UNION ALL SELECT '09' AS Hour UNION ALL
        SELECT '10' AS Hour UNION ALL SELECT '11' AS Hour UNION ALL
        SELECT '12' AS Hour UNION ALL SELECT '13' AS Hour UNION ALL
        SELECT '14' AS Hour UNION ALL SELECT '15' AS Hour UNION ALL
        SELECT '16' AS Hour UNION ALL SELECT '17' AS Hour UNION ALL
        SELECT '18' AS Hour UNION ALL SELECT '19' AS Hour UNION ALL
        SELECT '20' AS Hour UNION ALL SELECT '21' AS Hour UNION ALL
        SELECT '22' AS Hour UNION ALL SELECT '23' AS Hour) AS Hours
    ) AS T1,
    UserSession AS T2
WHERE
    -- Logged on during, logged off during
    (T2.SessionStart >= T1.StartHour AND T2.SessionEnd < T1.EndHour)
    -- Logged on before, logged off during
    OR (T2.SessionStart < T1.StartHour AND T2.SessionEnd >= StartHour AND T2.SessionEnd < T1.EndHour)
    -- Logged on during, logged off after
    OR (T2.SessionStart >= T1.StartHour AND T2.SessionStart < T1.EndHour AND T2.SessionEnd >= T1.EndHour)
    -- Logged on before, logged off after
    OR (T2.SessionStart < T1.StartHour AND T2.SessionEnd >= T1.EndHour)
GROUP BY
    T1.StartHour
ORDER BY
    T1.StartHour

The input needed is the day to aggregate in YYYY-MM-DD form. Note that it doesn't return any results for the hours where the count is zero.

Martin
+1  A: 

Perhaps you could have another table that, when logout times are recorded, populates records to determine hours when the user was logged in?

For example

create table hourlyUseLog (
    userID text not null,
    date float, // julian Day
    hour0 integer default 0,
    hour1 integer default 0,

etc...

    hour23 integer default 0,
);

If you had a structure like this, you could do very fast queries of who was logged in (or how many users were logged in) at any given time/date.

SQLite also supports bit fields and bit math, so you could also represent all of the hours in a day in a single integer and flip bits depending on the hours that users were active. This would allow you to do even faster queries with bit masks and would provide a mechanism to convert hours to julian day (time-portion only) representations and/or use a bit counting routine to calculate hours spent in the system.

Also, if you need real-time activity reporting and your system allows you to have a centralized representation of who's logged in, you could fire an hourly batch process that updates the hourlyUseLog records.

xyzzycoder
I think this is the best way to go. Martin's answer is a lot cleaner than mine, however, it performs the same And/Or operations as mine and is just as slow.I think I will go this route, an Hourly Log or Rollup table I think is the best solution. Thank you all
efess