ansaurus

Question

How to find the average time difference between rows in a table?

Answer 1

A:

Here's one way:

select avg(timestampdiff(MINUTE,prev.datecol,cur.datecol))
from table cur
inner join table prev 
    on cur.id = prev.id + 1 
    and cur.datecol <> prev.datecol

The timestampdiff function allows you to choose between days, months, seconds, and so on.

If the id's are not consecutive, you can select the previous row by adding a rule that there are no other rows in between:

select avg(timestampdiff(MINUTE,prev.datecol,cur.datecol))
from table cur
inner join table prev 
    on prev.datecol < cur.datecol
    and not exists (
        select * 
        from table inbetween 
        where prev.datecol < inbetween.datecol
        and inbetween.datecol < cur.datecol)
    )

Andomar 2009-05-18 09:38:06

Answer 2

+1 A:

Are the ID's contiguous ?

You could do something like,

SELECT 
      a.ID
      , b.ID
      , a.Timestamp 
      , b.Timestamp 
      , b.timestamp - a.timestamp as Difference
FROM
     MyTable a
     JOIN MyTable b
          ON a.ID = b.ID + 1 AND a.Timestamp <> b.Timestamp

That'll give you a list of time differences on each consecutive row pair...

Then you could wrap that up in an AVG grouping...

Eoin Campbell 2009-05-18 09:40:34

fixed .

Eoin Campbell 2009-05-18 09:47:02

OK, but this will work iff the ids ARE contiguous. Actually the answer from Nick is a better one I guess.

Bartosz Radaczyński 2009-05-18 10:34:58

Answer 3

+5 A:

If your table is t, and your timestamp column is ts, and you want the answer in seconds:

SELECT TIMESTAMPDIFF(SECOND, MIN(ts), MAX(ts) ) 
       /
       (COUNT(DISTINCT(ts)) -1) 
FROM t

This will be miles quicker for large tables as it has no n-squared JOIN

This uses a cute mathematical trick which helps with this problem. Ignore the problem of duplicates for the moment. The average time difference between consecutive rows is the difference between the first timestamp and the last timestamp, divided by the number of rows -1.

Proof: The average distance between consecutive rows is the sum of the distance between consective rows, divided by the number of consecutive rows. But the sum of the difference between consecutive rows is just the distance between the first row and last row (assuming they are sorted by timestamp). And the number of consecutive rows is the total number of rows -1.

Then we just condition the timestamps to be distinct.

Nick Fortescue 2009-05-18 10:18:52

Thanks, this is awesome.

Bartosz Radaczyński 2009-05-18 10:35:42

+1 Great, you could replace MIN(DISTINCT(ts)) with MIN(ts) as far as I can see.

Andomar 2009-05-18 10:39:35

Brilliant. Great answer to getting around the possible duplication of timestamps.

Bell 2009-05-18 10:44:24

Thanks Andomar, removed unnecessary DISTINCTs

Nick Fortescue 2009-05-18 10:47:36

ansaurus

tags:

views:

answers:

How to find the average time difference between rows in a table?

related questions