views:

51

answers:

2

I have a database table of events that happened:

(timestamp, other data...)

I want to group these by things that happened at 'pretty much the same time'. That is, when ordered by timestamp, such that all events in each group are within X seconds (e.g., X=3) of some other event in that group, and more than X seconds from all events in other groups.

Is there a way to do this even somewhat efficiently in SQL, or should I just ORDER BY timestamp, pull data into my app, and do it there?

+1  A: 

You could use UNIX_TIMESTAMP and DIV to calculate values that would be the same for events that happened "at the same time".

The following counts the number of events in 10 second intervals:

SELECT UNIX_TIMESTAMP(timestamp) DIV 10, COUNT(*)
FROM events
GROUP BY 1;
Josef
A: 

Sometimes I want to do something like this with some access logs we have. My data looks like:

EventID | UserID | When                | What
--------|--------|---------------------|--------
  7477  |   33   | 20090614:140517.131 | ...
  7478  |   33   | 20090614:140518.992 | ...
  7479  |   33   | 20090614:140522.020 | ...
  7480  |   33   | 20090614:142719.001 | ...
  7481  |   33   | 20090614:142720.668 | ...

Then I want to identify a "session" by userid and whether the times "lump", which is how I'm reading your statement. So, from the above:

 UserId | SessionStart       | Stuff
--------|--------------------|---------
   33   | 6/14/2009 14:05:17 | ...
   33   | 6/14/2009 14:27:19 | ...

I do this in SQL, using SQL Server. My strategy in this case is:

  1. Group by user
  2. Identify the delta between two records, per row.
  3. Make a IsNewSession column with 1 if the delta exceeds my threshold, else 0. This record is the time/date of the new session.
  4. Make a SessionNumber column which is the running total of IsNewSession. You can then use this number to identify the records in the session, group on them, etc.

In SQL Server, using a temp table, it's pretty quick. Using a single SQL statement, it quickly gets very slow. In both cases, it's really ugly. Oracle, on the other hand, has a nice set of analytic functions to handle the delta and the running total which makes the code both cleaner and (usually) quicker.

If mysql doesn't have any such magic, and if your team isn't particularly enamored with SQL, I'd recommend you just consider doing it in your app for maintainable production code.

Below is the sanitized version of what I'm using. If you want the "single SQL statement" version, let me know. Apologies for giving you SQL Server code instead of mysql. :)

-- Set up work table
DROP TABLE #temp
CREATE TABLE #temp
(
    ID INT PRIMARY KEY,
    EventDate DATETIME,
    RecordRank INT,
    IsNewSession INT,
    SessionNum INT
);

DECLARE
    @NumSecondsBetweenSessions INT,
    @StartDate DATETIME,
    @EndDate DATETIME
;

SELECT
    @NumSecondsBetweenSessions = 600,
    @StartDate = '20000101',
    @EndDate = '20201231'
;

-- Set up what will be our "Current" records in the "Current vs
-- Previous" comparision.
INSERT INTO #temp
(
    ID,
    EventDate,
    RecordRank,
    IsNewSession,
    SessionNum
)
SELECT
    SL.ID,
    SL.Created_DateTime,
    ROW_NUMBER() OVER (ORDER BY SL.Created_DateTime ASC) AS RecordRank,
    0,
    0
FROM
    SystemLog SL
WHERE
    SL.Created_DateTime BETWEEN @StartDate and @EndDate
;

-- Checking the time delta between the Current and Previous
-- records to see if we have a new session.
UPDATE #temp
SET
    IsNewSession = 
        CASE
            WHEN PrevT.EventDate IS NULL THEN 1
            WHEN DATEDIFF(s, PrevT.EventDate, #temp.EventDate) > @NumSecondsBetweenSessions THEN 1
            ELSE 0
        END
FROM
    #temp
    LEFT OUTER JOIN #temp PrevT
    ON #temp.RecordRank = (PrevT.RecordRank + 1)
;

-- This is performing a "running total" on IsNewSession to assign
-- records to a specific Session.
DECLARE @SessionNum INT;
SET @SessionNum = 0;
UPDATE #temp
SET
    @SessionNum = @SessionNum + IsNewSession,
    SessionNum = @SessionNum
;

-- The results.
SELECT
    T.*,
    SL.*
FROM
    #temp T
    JOIN SystemLog SL
    ON SL.ID = T.ID
ORDER BY
    RecordRank ASC
;
Matt