tags:

views:

75

answers:

2

I have a log file in SqlServer that stores the time an application started, the time the application is ready (i.e. finished loading), and the time that it's exited. Each of these occur as a separate entry. The format (and sample data) is as follows:

Date/Time                  User    Type    Application  Message
2009-11-03 12:26:12.403 uname1 Info app1      Started 
2009-11-03 12:26:22.403 uname1 Info app1      Loaded 
2009-11-03 12:27:15.403 uname2 Info app1      Started 
2009-11-03 12:27:16.401 uname1 Info app1      Exited 
2009-11-03 12:27:18.403 uname2 Info app1      Loaded 
2009-11-03 12:29:12.403 uname2 Info app1      Exited

I would like to find out, per application and per user, the amount of time it took for the application to get to a ready state and the amount of time the application was running. This would be a piece of cake if each date/time was in the same record, and it would also be easy (though tedious) to just load up each record as a cursor and sift through the data, but I thought that there must be some way to do this "correctly" in a set-theory manner.

So, to reiterate, the following output (from the sample data from above) would be expected (numbers are in seconds, rounded up):

User    Application    Ready    Uptime
uname1  app1           10       64
uname2  app1           3        117

Any suggestions?

EDIT: The good news is that the application can only be started once. BUT, the log does not take into account if the application crashed (though I suppose I could look for "exited" and "crashed" as final conditions).

+3  A: 

I'd prefer to not join on the same table multiple times, especially if the table gets huge. This is kind of a two-pass approach. The first pass sorts the times into the right places, and the second pass folds them up by user and application:

SELECT 
  User,
  Application,
  MAX(StartTime) StartTime,
  MAX(ReadyTime) ReadyTime,
  MAX(ExitTime) ExitTime,
FROM (
  SELECT
    User,
    Application,
    CASE (
      WHEN Message = 'Started' THEN Date/Time
      ELSE NULL
    ) StartTime,
    CASE (
      WHEN Message = 'Loaded' THEN Date/Time
      ELSE NULL
    ) ReadyTime,
    CASE (
      WHEN Message = 'Exited' THEN Date/Time
      ELSE NULL
    ) ExitTime
  FROM Log
) Log
GROUP BY 
  User,
  Application

And from there it's trivial to calculate all you want on those different times.

It's not very "set-theory"-like, but grouping and aggregation never are. Like Eric's solution, it doesn't handle the situation when the same user has used the application multiple times. You'd need a third grouping column (like "session" or something) to handle that scenario.

Welbog
Unfortunately this only gives the _last_ time the user started and exited the application. I was hoping for something that would give _each_ start/exit over a given period of time. I'll keep trying to work with this, though.
Michael Todd
+2  A: 

How about

  Select S.user, S.Application,
        S.DateTime Started, L.DateTime Loaded, X.DateTime Exited,
        L.DateTime - S.DateTime LoadTime,
        X.DateTime - L.DateTime RunTime
  From LogFile S
      Full Join LogFile L
          On S.Message = 'Started'
             And L.Message = 'Loaded'
             And L.User = S.user
             And L.Application = S.Application
             And L.DateTime = (Select Min(DateTime)
                               From LogFile 
                               Where Message = 'Loaded'
                                 And application = S.Application
                                 And user = S.user
                                 And DateTime > S.DateTime)
      Full Join LogFile X
          On L.Message = 'Loaded'
             And X.Message = 'Exited'
             And X.User = L.user
             And X.Application = L.Application
             And  X.DateTime = (Select Min(DateTime)
                                From LogFile 
                                Where Message = 'Exited'
                                  And application = L.Application
                                  And user = L.user
                                  And DateTime > L.DateTime)

Then apply aggregate functions to this for what you want:

 Select user, Application, 
    Sum(LoadTime) TotLoadTime,
    Sum(RunTime) TotalRunTime
 From
      (Select S.user, S.Application,
        S.DateTime Started, L.DateTime Loaded, X.DateTime Exited,
        L.DateTime - S.DateTime LoadTime,
        X.DateTime - L.DateTime RunTime
       From LogFile S
          Full Join LogFile L
             On S.Message = 'Started'
                And X.Message = 'Loaded'
                 And L.User = S.user
                 And L.Application = S.Application
                 And L.DateTime =
                        (Select Min(DateTime)
                         From LogFile 
                         Where Message = 'Loaded'
                            And application = S.Application
                            And user = S.user
                            And DateTime > S.DateTime)
         Full Join LogFile X
             On L.Message = 'Loaded'
               And X.Message = 'Exited'
               And X.User = L.user
               And X.Application = L.Application
               And  X.DateTime =
                        (Select Min(DateTime)
                         From LogFile 
                         Where Message = 'Exited'
                             And application = L.Application
                             And user = L.user
                             And DateTime > L.DateTime)) Z
 Group By user, Application
Charles Bretana
As hideous as this is, it actually does work. And it only took 2 seconds of server time to grab it. Thanks.
Michael Todd
Thank you so much for this. I was trying to devise numerous methods to resolve this issue and was just about to throw my hands up and write a program to do it. This is much easier and quicker.
Michael Todd
wellll... thanks so much for the hideous comment (only kidding!) you are very welcome!
Charles Bretana
Perhaps hideous wasn't the correct word; monstrous or unwieldy, maybe. But it's not an easy thing to do, apparently, and it does work, so thanks again.
Michael Todd
When I do stuff like this, I usually put the query doing the self-joins into a view. Then the second query (where you're aggregating) looks much cleaner.
Eric H.
@Michael, Again, no offense taken, it's queries like this that make formatting style so critical to readability and long term maintainability
Charles Bretana