ansaurus

Question

Group events by temporal distance in SQL

Answer 1

+1 A:

Based on comment thread,

A. Buckets are defined by the first record in the bucket, and the first record in each Bucket is defined as any row where the DateCreated is more than 30 minutes after the latest earlier DateCreated. (immediately previous record)

B. The rest of the rows in the bucket are all rows with DateCreated on or after the First Row whose DateCreated is less than 30 minutes after the immediately previous row, and there does not exist a non-qualifying, (or new bucket-defining), row since the specified Bucket-defining row.

In English:

Select The DateCreated of those records wheret he DateCreated is more than 30 minutes after the previous DateCreated and aggregate function of your choice on all the other records in table whose DateCreated is after that bucket-defining datecreated, less than 30 minutes after it's immedialte previous DateCreated, and there are no records between the bucket-defining DateCreated and this one which follow a greater than 30 minute gap.

In SQL:

  Select Z.BucketDefinitionDate , Count(*) RowsInBucket
  From (Select Distinct DateCreated BucketDefinitionDate 
        From Table Ti
        Where DateCreated > DateAdd(minute, 30, 
            (Select Max(DateCreated) From Table
             Where DateCreated < Ti.DateCreated))) Z
    Join Table B 
       On B.DateCreated > Z.BucketDefinitionDate 
           And Not Exists
             (Select * From Table
              Where DateCreated Between Z.BucketDefinitionDate 
                                    And B.DateCreated 
                 And DateCreated > DateAdd(minute, 30,
                     (Select Max(DateCreated) From Table
                      Where DateCreated < B.DateCreated))) 
  Group By Z.BucketDefinitionDate

Charles Bretana 2009-12-09 17:08:28

I wouldn't be posting if that solution would work :-) The caveat is that events can (a) occur in contiguous blocks longer than 30 minutes and (b) blocks can cross 30-min boundaries

austincav 2009-12-09 17:18:13

You have to have some set-based rule as to what defines a "bucket" or a "block" which will define the bucket... What is that rule

Charles Bretana 2009-12-09 17:22:31

Your question states "is a business rule that says anything > 30 minutes should be a new bucket" What does this "> 30 minutes" start from ? The previous row? So that if there is a row every 29 minutes they will all fall in same bucket? or from the first row in the bucket ? or from the last row in the previous bucket ?

Charles Bretana 2009-12-09 17:24:00

Yes - per the example, the rule is about the difference between one record and the next in sequential order.

austincav 2009-12-09 17:41:27

So more accurately a bucket is a group of events where the beginning and end event are at least 30 minutes removed from the next contiguous (time ordered) event. Does that make sense?I am basically trying to "cluster" them together based on timing.

austincav 2009-12-09 17:43:08

And I do not have the buckets yet - the algorithm I want is going to *create* the buckets (whether it is by ID or something else, I don't care... just need a way to group them).

austincav 2009-12-09 17:43:55

Maybe you should post the cursor code so we can see what the rules are

Ray 2009-12-09 17:54:43

Replacing a poorly phrased comment - what happens if you have an event from the same user every 5 minutes? Do you get 6 events into a bucket, and then start a new bucket? Or do they all go in the same bucket because there is no 30 minute gap between any events? Or something else?

Ray 2009-12-09 18:04:40

Based on new comments, I have rewritten query... try it now...

Charles Bretana 2009-12-09 20:58:37

Answer 2

A:

What you can try is

DECLARE @TABLE TABLE(
     ID INT,
     EventID INT,
     DateCreated DATETIME
)

INSERT INTO @TABLE SELECT 123, 111, '2009-12-01 9:15am'
INSERT INTO @TABLE SELECT 123, 222, '2009-12-01 9:20am' 
INSERT INTO @TABLE SELECT 123, 333, '2009-12-01 9:25am'
INSERT INTO @TABLE SELECT 123, 444, '2009-12-03 2:30pm'
INSERT INTO @TABLE SELECT 123, 555, '2009-12-01 2:32pm'

SELECT  ID,
     DATEADD(dd, DATEDIFF(dd,0,DateCreated), 0) DayVal,
     DATEPART(hh, DateCreated) HourPart,
     FLOOR(DATEPART(mi, DateCreated) / 30.) MinBucket
FROM    @TABLE

Now you can group by DayVal, HourPart and MinBucket.

astander 2009-12-09 17:12:12

See note for previous answer, I think your solution suffers from same problem.

austincav 2009-12-09 17:19:03

You will find this bucketing EXTEMELY difficult once the occurences apreat 5 mins apart. How do you decide where to start a given bucket and where to end? That will then revert to a cursor query...

astander 2009-12-09 17:56:59

I know it is difficult. And a cursor is the easy--though slow--answer. I thought the smart people of StackOverflow could help me come up w/ a better answer.

austincav 2009-12-09 18:06:03

Answer 3

+1 A:

Hopefully this will get you going in the right direction. If you're in an SP then using table variables for the StartTimes and EndTimes should make the query much easier to read and understand. This will give you start and end times for your batches, then just join back to your table and you should have it.

;WITH StartTimes AS
(
SELECT DISTINCT
    T1.DateCreated AS StartTime
FROM
    My_Table T1
LEFT OUTER JOIN My_Table T2 ON
    T2.UserID = T1.UserID AND
    T2.EventID = T1.EventID AND
    T2.DateCreated >= DATEADD(mi, -30, T1.DateCreated) AND
    T2.DateCreated < T1.DateCreated
WHERE
    T2.UserID IS NULL
)
SELECT
    StartTimes.StartTime,
    EndTimes.EndTime
FROM
(
SELECT DISTINCT
    T3.DateCreated AS EndTime
FROM
    My_Table T3
LEFT OUTER JOIN My_Table T4 ON
    T4.UserID = T3.UserID AND
    T4.EventID = T3.EventID AND
    T4.DateCreated <= DATEADD(mi, 30, T3.DateCreated) AND
    T4.DateCreated > T3.DateCreated
WHERE
    T4.UserID IS NULL
) AS ET
INNER JOIN StartTimes ST ON
    ST.StartTime <= ET.EndTimes
LEFT OUTER JOIN StartTimes ST2 ON
    ST2.StartTime <= ET.EndTimes AND
    ST2.StartTime > ST.StartTime
WHERE
    ST2.StartTime IS NULL

Tom H. 2009-12-09 17:41:06

You may be on to something here! let me know and get back to you. Thanks man.

austincav 2009-12-09 18:06:49

Answer 4

A:

I think I have something for you. it is not a cool single query like Tom H posted, but it seems to work. It uses a table variable as a working table.

declare @table table(
    id int identity(1,1),
    userId int,
    eventId int,dateCreated datetime,
    bucket int
)

insert into @table select 123, 111, '2009-12-01 9:15am', 0
// etc... insert more rows - note that the 'bucket' field is set to 0

declare @next_bucket int
set @next_bucket = 1
update @table 
    set bucket = @next_bucket, @next_bucket = @next_bucket + 1
    from @table as [current]
    where datecreated > dateadd(mi, 30, (select datecreated from @table as previous where [current].id = previous.id + 1))

update @table 
    set bucket =
     coalesce(( select max(bucket)
       from @table as previous
       where previous.id < [current].id
        and bucket <> 0
     ), 1)

    from @table as [current]
    where bucket = 0

-- return the results
select * from @table

Ray 2009-12-09 18:41:25

ansaurus

tags:

views:

answers:

Group events by temporal distance in SQL

related questions