tags:

views:

848

answers:

3

I have a mysql database that stores some timestamps. Let's assume that all there is in the table is the ID and the timestamp. The timestamps might be duplicated.

I want to find the average time difference between consecutive rows that are not duplicates (timewise). Is there a way to do it in SQL?

A: 

Here's one way:

select avg(timestampdiff(MINUTE,prev.datecol,cur.datecol))
from table cur
inner join table prev 
    on cur.id = prev.id + 1 
    and cur.datecol <> prev.datecol

The timestampdiff function allows you to choose between days, months, seconds, and so on.

If the id's are not consecutive, you can select the previous row by adding a rule that there are no other rows in between:

select avg(timestampdiff(MINUTE,prev.datecol,cur.datecol))
from table cur
inner join table prev 
    on prev.datecol < cur.datecol
    and not exists (
        select * 
        from table inbetween 
        where prev.datecol < inbetween.datecol
        and inbetween.datecol < cur.datecol)
    )
Andomar
+1  A: 

Are the ID's contiguous ?

You could do something like,

SELECT 
      a.ID
      , b.ID
      , a.Timestamp 
      , b.Timestamp 
      , b.timestamp - a.timestamp as Difference
FROM
     MyTable a
     JOIN MyTable b
          ON a.ID = b.ID + 1 AND a.Timestamp <> b.Timestamp

That'll give you a list of time differences on each consecutive row pair...

Then you could wrap that up in an AVG grouping...

Eoin Campbell
fixed .
Eoin Campbell
OK, but this will work iff the ids ARE contiguous. Actually the answer from Nick is a better one I guess.
Bartosz Radaczyński
+5  A: 

If your table is t, and your timestamp column is ts, and you want the answer in seconds:

SELECT TIMESTAMPDIFF(SECOND, MIN(ts), MAX(ts) ) 
       /
       (COUNT(DISTINCT(ts)) -1) 
FROM t

This will be miles quicker for large tables as it has no n-squared JOIN

This uses a cute mathematical trick which helps with this problem. Ignore the problem of duplicates for the moment. The average time difference between consecutive rows is the difference between the first timestamp and the last timestamp, divided by the number of rows -1.

Proof: The average distance between consecutive rows is the sum of the distance between consective rows, divided by the number of consecutive rows. But the sum of the difference between consecutive rows is just the distance between the first row and last row (assuming they are sorted by timestamp). And the number of consecutive rows is the total number of rows -1.

Then we just condition the timestamps to be distinct.

Nick Fortescue
Thanks, this is awesome.
Bartosz Radaczyński
+1 Great, you could replace MIN(DISTINCT(ts)) with MIN(ts) as far as I can see.
Andomar
Brilliant. Great answer to getting around the possible duplication of timestamps.
Bell
Thanks Andomar, removed unnecessary DISTINCTs
Nick Fortescue