views:

81

answers:

3

Let's say I have two entities: Event and Activity

An Event is something that happens at (seemingly) random times, like Sunrise, Sunset, Storm, Fog, etc.

I have a table for this:

create table Event (
eventKey int,
eventDesc varchar(100),
started datetime
)

 EventKey  | EventDesc  | Started
 1           "Sunset"     2009-07-03 6:51pm 
 2           "Sunrise"    2009-07-04 5:33am
 3           "Fog"        2009-07-04 5:52pm
 4           "Sunset"     2009-07-04 6:49pm
 5           "Full Moon"  2009-07-04 10:12pm
 6           "Sunrise"    2009-07-05 5:34am

Then I have a table of activities that people participated in, and which events they relate to (i.e. an action could be long-running and cross multiple events: "Camp-out over the weekend"):

create table EventTask (
activityKey int,
activityDesc varchar(100),
startEventKey int,
endEventKey int
)

ActivityKey  |  ActivityDesc | StartEventKey | EndEventKey
123             "Camp-out"     1               5
234             "Drive home"   6               6

I want to output a timeline of actions that are marked by the events that happened:

ActivityKey  |  ActivityDesc | EventKey  | EventDesc
123             "Camp-out"     1           "Sunset"
123             "Camp-out"     2           "Sunrise"
123             "Camp-out"     3           "Fog"
123             "Camp-out"     4           "Sunset"
123             "Camp-out"     5           "Full Moon"
234             "Drive Home"   6           "Sunrise"

Is it possible to write a query that will do this in linear time, similar to this question? Please also recommend indexes or any other optimizations you can think of. The current solution is written in C#, but I would love a fast SQL solution.

What is the optimal query to do this?

+1  A: 
/*
create table Event (
eventKey int,
eventDesc varchar(100),
started timestamp
);

 insert into event values( 1,           'Sunset' ,    '2009-07-03 6:51pm');
 insert into event values(2,           'Sunrise',    '2009-07-04 5:33am');
 insert into event values(3,           'Fog'     ,   '2009-07-04 5:52pm');
 insert into event values(4,           'Sunset'   ,  '2009-07-04 6:49pm');
 insert into event values(5,           'Full Moon',  '2009-07-04 10:12pm');
 insert into event values(6,           'Sunrise'   , '2009-07-05 5:34am');

select * from event;

create table EventTask (
activityKey int,
activityDesc varchar(100),
startEventKey int,
endEventKey int
)

insert into eventtask values(123 ,            'Camp-out',     1 ,              5);
insert into eventtask values(234,             'Drive home',   6,               6);

select * from eventtask;

*/

select a.activitykey, a.activitydesc, b.eventkey, b.eventdesc
from
        eventtask a
join    event b on b.eventkey between a.starteventkey and a.endeventkey
order by
        a.activitykey, b.eventkey;

 activitykey     activitydesc     eventkey     eventdesc    
 --------------  ---------------  -----------  ------------ 
 123             Camp-out         1            Sunset       
 123             Camp-out         2            Sunrise      
 123             Camp-out         3            Fog          
 123             Camp-out         4            Sunset       
 123             Camp-out         5            Full Moon    
 234             Drive home       6            Sunrise      

 6 record(s) selected [Fetch MetaData: 3/ms] [Fetch Data: 1/ms] 

 [Executed: 7/7/09 4:24:34 PM EDT ] [Execution: 15/ms]

If your tables are large, you would definitely want indexes on event.eventkey, eventtask.starteventkey and eventtask.endeventkey.

Note that indexes improve query speed but slow insert and update.

Here's the version which does NOT require the event.eventkey column to have significance (more correct):

select a.activitykey, a.activitydesc, d.eventkey, d.eventdesc
from
        eventtask a
join    event     b on b.eventkey = a.starteventkey
join    event     c on c.eventkey = a.endeventkey
join    event     d on d.started between b.started and c.started
order by
        a.activitykey, d.started;

 activitykey     activitydesc     eventkey     eventdesc    
 --------------  ---------------  -----------  ------------ 
 123             Camp-out         1            Sunset       
 123             Camp-out         2            Sunrise      
 123             Camp-out         3            Fog          
 123             Camp-out         4            Sunset       
 123             Camp-out         5            Full Moon    
 234             Drive home       6            Sunrise      

 6 record(s) selected [Fetch MetaData: 2/ms] [Fetch Data: 0/ms] 

 [Executed: 7/8/09 10:01:25 AM EDT ] [Execution: 4/ms]
N. Zax
very interesting solution! I wouldn't have imagined doing it this way, but I like it.
Jeff Meatball Yang
+1  A: 

I would redefine the activity table so has a startTime and an EndTime, rather than base it on the random events. Then if I really want to see what 'events' were taking place during that time, I would join on the time range. This makes more sense from an OO/flexibility perspective, though you will see a larger performance cost.

declare @Event table(
id int,
name varchar(100),
[time] datetime
);

 insert into @Event values(1, 'Sunset', '2009-07-03 6:51pm');
 insert into @Event values(2, 'Sunrise', '2009-07-04 5:33am');
 insert into @Event values(3, 'Fog', '2009-07-04 5:52pm');
 insert into @Event values(4, 'Sunset', '2009-07-04 6:49pm');
 insert into @Event values(5, 'Full Moon', '2009-07-04 10:12pm');
 insert into @Event values(6, 'Sunrise', '2009-07-05 5:34am');

select * from @Event;

declare @Activity table (
id int,
name varchar(100),
startTime datetime,
endTime datetime
)

insert into @Activity values(123, 'Camp-out', '2009-07-03 6:00pm', '2009-07-05 5:00am');
insert into @Activity values(234, 'Drive home', '2009-07-05 5:00am', '2009-07-05 6:00am');

select *
from @Activity A
join @Event E on E.[time] > A.startTime and E.[time] < A.endTime
order by A.startTime
Rob Elliott
+1  A: 

I recently wrote up two ways to optimize such queries (joins on BETWEEN conditions): Using CROSS APPLY to optimize joins on BETWEEN conditions

Possible query (impossible to test without sample INSERTs):

SELECT et.activityKey,
et.activityDesc,
e.*
FROM Event AS e CROSS APPLY(SELECT TOP 1 * FROM EventTask  AS et
WHERE et.startEventKey <= e.started
AND e.started < endEventKey 
ORDER BY et.startEventKey
) AS et
AlexKuznetsov
what is the query I would write?
Jeff Meatball Yang