ansaurus

Question

How do I analyse time periods between records in SQL data without cursors?

Answer 1

+1 A:

SELECT TheDay, AVG(TimeTaken) AvgTimeTaken
FROM (  
SELECT 
    CONVERT(DATE, logins.DateTimeStamp) TheDay
    , DATEDIFF(SS, logins.DateTimeStamp, 
       (SELECT TOP 1 DateTimeStamp 
        FROM AuditData userinfo 
        WHERE UserID=logins.UserID 
        and userinfo.ActivityCode=2 
        and userinfo.DateTimeStamp > logins.DateTimeStamp )
       )TimeTaken
FROM AuditData logins
WHERE 
    logins.ActivityCode = 1
) LogInTimes
GROUP BY TheDay

This might be dead slow in real world though.

idstam 2009-08-14 08:09:40

Thanks, I'll give it a go. It would have to go it some to be slower than the current cursor style cludge I'm using though.

Jon Hopkins 2009-08-14 08:44:22

Answer 2

+1 A:

In Oracle this would be a cinch, because of analytic functions. In this case, LAG() makes it easy to find the matching pairs of activity codes 1 and 2 and also to calculate the trend. As you can see, things got worse on 2nd JAN and improved quite a bit on the 3rd (I'm working in seconds rather than minutes).

SQL> select DateOnly
  2         , elapsed_time
  3         , elapsed_time - lag (elapsed_time) over (order by DateOnly) as trend
  4  from
  5      (
  6      select DateOnly
  7             , avg(databack_time - prior_login_time) as elapsed_time
  8      from
  9          ( select DateOnly
 10                  , databack_time
 11                  , ActivityCode
 12                  , lag(login_time) over (order by DateOnly,UserID, AuditRecordID, ActivityCode) as prior_login_time
 13            from
 14              (
 15                  select a1.AuditRecordID
 16                         , a1.DateOnly
 17                         , a1.UserID
 18                         , a1.ActivityCode
 19                         , to_number(to_char(a1.DateTimeStamp, 'SSSSS')) as login_time
 20                         , 0 as databack_time
 21                  from   AuditData a1
 22                  where a1.ActivityCode = 1
 23                  union all
 24                  select a2.AuditRecordID
 25                         , a2.DateOnly
 26                         , a2.UserID
 27                         , a2.ActivityCode
 28                         , 0 as login_time
 29                         , to_number(to_char(a2.DateTimeStamp, 'SSSSS')) as databack_time
 30                  from   AuditData a2
 31                  where a2.ActivityCode = 2
 32                  )
 33              )
 34      where ActivityCode = 2
 35      group by  DateOnly
 36  )
 37  /

DATEONLY  ELAPSED_TIME      TREND
--------- ------------ ----------
01-JAN-09          120
02-JAN-09          600        480
03-JAN-09          150       -450

SQL>

Like I said in my comment I guess you're working in MSSQL. I don't know whether that product has any equivalent of LAG().

APC 2009-08-14 09:53:17

I am using SQL but you get an upvote for something interesting I didn't know about Oracle.

Jon Hopkins 2009-08-28 09:53:49

I am still having to get used to the idea that "SQL" means MS SQL Server rather than "Structured Query Language".

APC 2009-08-28 20:24:58

Answer 3

+1 A:

If the assumptions are that:

Users will perform various tasks in no mandated order, and
That the difference between any two activities reflects the time it takes for the first of those two activities to execute,

Then why not create a table with two timestamps, the first column containing the activity start time, the second column containing the next activity start time. Thus the difference between these two will always be total time of the first activity. So for the logout activity, you would just have NULL for the second column.

So it would be kind of weird and interesting, for each activity (other than logging in and logging out), the time stamp would be recorded in two different rows--once for the last activity (as the time "completed") and again in a new row (as time started). You would end up with a jacob's ladder of sorts, but finding the data you are after would be much more simple.

In fact, to get really wacky, you could have each row have the time that the user started activity A and the activity code, and the time started activity B and the time stamp (which, as mentioned above, gets put down again for the following row). This way each row will tell you the exact difference in time for any two activities.

Otherwise, you're stuck with a query that says something like

SELECT TIME_IN_SEC(row2-timestamp) - TIME_IN_SEC(row1-timestamp)

which would be pretty slow, as you have already suggested. By swallowing the redundancy, you end up just querying the difference between the two columns. You probably would have less need of knowing the user info as well, since you'd know that any row shows both activity codes, thus you can just query the average for all users on any given day and compare it to the next day (unless you are trying to find out which users are having the problem as well).

Anthony 2009-08-14 10:14:58

The issue I have is that this is historical data - I can't record or use more than I have. There are also 48 possible activities and I'd potentially like to be able to measure relationships between each of them (for instance there might be requests to 3rd party systems and receipt of data from third party systems).

Jon Hopkins 2009-08-14 11:47:51

Answer 4

+1 A:

This is the faster query to find out, in one row you will have current and row before datetime value, after that you can use DATEDIFF ( datepart , startdate , enddate ). I use @DammyVariable and DamyField as i remember the is some problem if is not first @variable=Field in update statement.

SELECT *, Cast(NULL AS DateTime) LastRowDateTime, Cast(NULL As INT) DamyField INTO #T FROM AuditData 
GO
CREATE CLUSTERED INDEX IX_T ON #T (AuditRecordID)
GO
DECLARE @LastRowDateTime DateTime
DECLARE @DammyVariable INT

SET @LastRowDateTime = NULL 
SET @DammyVariable = 1

UPDATE #T SET 
  @DammyVariable = DammyField = @DammyVariable
, LastRowDateTime = @LastRowDateTime 
    , @LastRowDateTime = DateTimeStamp 
option (maxdop 1)

2009-08-14 10:27:35

ansaurus

tags:

views:

answers:

How do I analyse time periods between records in SQL data without cursors?

related questions