views:

173

answers:

4

I have a table where each row has a start and stop date-time. These can be arbitrarily short or long spans.

I want to query the sum duration of the intersection of all rows with two start and stop date-times.

How can you do this in MySQL?

Or do you have to select the rows that intersect the query start and stop times, then calculate the actual overlap of each row and sum it client-side?


To give an example, using milliseconds to make it clearer:

Some rows:

ROW  START  STOP
1    1010   1240
2     950   1040
3    1120   1121

And we want to know the sum time that these rows were between 1030 and 1100.

Lets compute the overlap of each row:

ROW  INTERSECTION
1    70
2    10
3     0

So the sum in this example is 80.

+1  A: 

I fear you're out of luck.

Since you don't know the number of rows that you will be "cumulatively intersecting", you need either a recursive solution, or an aggregation operator.

The aggregation operator you need is no option because SQL does not have the data type that it is supposed to operate on (that type being an interval type, as described in "Temporal Data and the Relational Model").

The recursive solution may be possible, but it is likely to be difficult to write, difficult to read to other programmers, and it is also questionable whether the optimizer can turn that query into the optimal data access strategy.

Or I misunderstood your question.

Erwin Smout
+2  A: 

If your example should have said 70 in the first row then

assuming @range_start and @range_end as your condition paramters:

SELECT SUM( LEAST(@range_end, stop) - GREATEST(@range_start, start) )
FROM Table
WHERE @range_start < stop AND @range_end > start

using the greatest/least and date functions you should be able to get what you need directly operating on the date type.

Unreason
A: 

There's a fairly interesting solution if you know the maximum time you'll ever have. Create a table with all the numbers in it from one to your maximum time.

millisecond
-----------
1
2
3
...
1240

Call it time_dimension (this technique is often used in dimensional modelling in data warehousing.)

Then this:

SELECT 
  COUNT(*) 
FROM 
  your_data 
    INNER JOIN time_dimension ON time_dimension.millisecond BETWEEN your_data.start AND your_data.stop
WHERE 
  time_dimension.millisecond BETWEEN 1030 AND 1100

...will give you the total number of milliseconds of running time between 1030 and 1100.

Of course, whether you can use this technique depends on whether you can safely predict the maximum number of milliseconds that will ever be in your data.

This is often used in data warehousing, as I said; it fits well with some kinds of problems -- for example, I've used it for insurance systems, where a total number of days between two dates was needed, and where the overall date range of the data was easy to estimate (from the earliest customer date of birth to a date a couple of years into the future, beyond the end date of any policies that were being sold.)

Might not work for you, but I figured it was worth sharing as an interesting technique!

Matt Gibson
A: 

After you added the example, it is clear that indeed I misunderstood your question.

You are not "cumulatively intersecting rows".

The steps that will bring you to a solution are :

intersect each row's start and end point with the given start and end points. This should be doable using CASE expressions or something of that nature, something in the style of :

SELECT (CASE startdate < givenstartdate : givenstartdate, CASE startdate >= givenstartdate : startdate) as retainedstartdate, (likewise for enddate) as retainedenddate FROM ... Cater for nulls and that sort of stuff as needed.

With the retainedstartdate and retainedenddate, use a date function to compute the length of the retained interval (which is the overlap of your row with the given time section).

SELECT the SUM() of those.

Erwin Smout