views:

701

answers:

3

I have a table of events, each with a StartTime and EndTime (as type DateTime) in a MySQL Table.

I'm trying to output the sum of overlapping times and the number of events that overlapped.

What is the most efficient / simple way to perform this query in MySQL?

CREATE TABLE IF NOT EXISTS `events` (
  `EventID` int(10) unsigned NOT NULL auto_increment,
  `StartTime` datetime NOT NULL,
  `EndTime` datetime default NULL,
  PRIMARY KEY  (`EventID`)
) ENGINE=MyISAM  DEFAULT CHARSET=latin1 AUTO_INCREMENT=37 ;


INSERT INTO `events` (`EventID`, `StartTime`, `EndTime`) VALUES
(10001, '2009-02-09 03:00:00', '2009-02-09 10:00:00'),
(10002, '2009-02-09 05:00:00', '2009-02-09 09:00:00'),
(10003, '2009-02-09 07:00:00', '2009-02-09 09:00:00');


# if the query was run using the data above,
# the table below would be the desired output

# Number of Overlapped Events | Total Amount of Time those events overlapped.
1, 03:00:00
2, 02:00:00
3, 02:00:00

The purpose of these results is to generate a bill for hours used. (if you have one event running, you might pay 10 dollars per hour. But if two events are running, you only have to pay 8 dollars per hour, but only for the period of time you had two events running.)

A: 

Start with a table that contains a single datetime field as its primary key, and populate that table with every time value you're interested in. A leap years has 527040 minutes (31622400 seconds), so this table might get big if your events span several years.

Now join against this table doing something like

SELECT i.dt as instant, count(*) as events
FROM instant i JOIN event e ON i.dt BETWEEN e.start AND e.end
GROUP BY i.dt
WHERE i.dt BETWEEN ? AND ?

Having an index on instant.dt may let you forgo an ORDER BY.

If events are added infrequently, this may be something you want to precalculate by running the query offline, populating a separate table.

Dave W. Smith
A: 

I would suggest an in-memory structure that has start-time,end-time,#events... (This is simplified as time(hours), but using unix time gives up to the second accuracy)

For every event, you would insert the new event as-is if there's no overlap, otherwise, find the overlap, and split the event to (up to 3) parts that may be overlapping, With your example data, starting from the first event:

Event 1 starts at 3am and ends at 10am: Just add the event since no overlaps:

    3,10,1

Event 2 starts at 5am and ends at 9am: Overlaps,so split the original, and add the new one with extra "#events"

    3,5,1
    5,9,2
    9,10,1

Event 3 starts at 7am and ends at 9am: also overlaps, do the same with all periods:

    3,5,1
    5,7,2
    7,9,3
    9,10,1

So calculating the overlap hours per #events:

1 event= (5-3)+(10-9)=3 hours
2 events = 7-5 = 2 hours
3 events = 9-7 = 2 hours

It would make sense to run this as a background process if there are many events to compare.

Osama ALASSIRY
+2  A: 

Try this:

SELECT `COUNT`, SEC_TO_TIME(SUM(Duration))
FROM (
    SELECT
        COUNT(*) AS `Count`,
        UNIX_TIMESTAMP(Times2.Time) - UNIX_TIMESTAMP(Times1.Time) AS Duration
    FROM (
        SELECT @rownum1 := @rownum1 + 1 AS rownum, `Time`
        FROM (
            SELECT DISTINCT(StartTime) AS `Time` FROM events
            UNION
            SELECT DISTINCT(EndTime) AS `Time` FROM events
        ) AS AllTimes, (SELECT @rownum1 := 0) AS Rownum
        ORDER BY `Time` DESC
    ) As Times1
    JOIN (
        SELECT @rownum2 := @rownum2 + 1 AS rownum, `Time`
        FROM (
            SELECT DISTINCT(StartTime) AS `Time` FROM events
            UNION
            SELECT DISTINCT(EndTime) AS `Time` FROM events
        ) AS AllTimes, (SELECT @rownum2 := 0) AS Rownum
        ORDER BY `Time` DESC
    ) As Times2
    ON Times1.rownum = Times2.rownum + 1
    JOIN events ON Times1.Time >= events.StartTime AND Times2.Time <= events.EndTime
    GROUP BY Times1.rownum
) Totals
GROUP BY `Count`

Result:

1, 03:00:00
2, 02:00:00
3, 02:00:00

If this doesn't do what you want, or you want some explanation, please let me know. It could be made faster by storing the repeated subquery AllTimes in a temporary table, but hopefully it runs fast enough as it is.

Mark Byers
+1 for a most elegant solution.
Anax
On my MySQL server, it wont run unless 'times2.rownum ' is changed to 'Times2.rownum' but otherwise, that's exactly what I was looking for! Works perfectly! Thanks!
maxsilver
Sorry about that - it was a typo, and it didn't fail for me so I didn't notice it. Fixed now. Glad you got it working! :)
Mark Byers
So apparently maxsilver ran the query on Linux and Mark on Windows.
Anax
@Anax: You're right that I tested the query on Windows. :)
Mark Byers