In general, I need to associate (group) records which are created in similar time periods. If it helps, thinking of the example below as clickstream data where there is no sessionID and I need to build those sessions.
I have the following dataset:
UserId INT,
EventId INT,
DateCreated DATETIME,
BlockId INT
Assume the following data:
{123, 111, '2009-12-01 9:15am', NULL}
{123, 222, '2009-12-01 9:20am', NULL}
{123, 333, '2009-12-01 9:25am', NULL}
{123, 444, '2009-12-03 2:30pm', NULL}
{123, 555, '2009-12-03 2:32pm', NULL}
What I need to do is divide these events up, by user, into temporal buckets. There is a business rule that says anything > 30 minutes should be a new bucket. In the above example, events 111-333 represent a block, i.e. not more than 30 minutes separates them. Likewise, events 444-555 represent a second block.
My current solution uses a cursor and is extremely slow (therefore, unsustainable for the amount of data I need to process). I can post the code but it is pretty simple.
Any ideas?