views:

290

answers:

4

I have an event log in a SQL Server database. Essentially it records when a call was made and when that call ended at a call center (as two different records), as well as a few other details. I am trying to get an idea of how many phone lines are being used at any given time with this data. I can't think of any good way to have a SQL query determine this for me, although this would be ideal (if it didn't sacrifice much speed).

My first thought was to have a program query the start and end events for each call, determining the duration of the call. I could then step through each unit of time, keeping a tally of how many calls were in progress at any given time. Is there a way I can do this in SQL instead of using a linear method in C# or something similar?

Edit: There is a unique ID for the calls. A session ID, if you will. Also, the start and end events are two different records -- not a single record. This complicates this a bit I think. Also, this table has over 15 million records in it.

Id  EvId             CallId                           DateTime       
--  ---- ------------------------------------    --------------------
 1  0 df1cbc93-5cf3-402a-940b-4441f6a7ec5c  7/9/2008 8:12:56 PM
 2  1 df1cbc93-5cf3-402a-940b-4441f6a7ec5c  7/9/2008 8:13:07 PM
 3  0 ec1c2078-1765-4377-9126-6f26fe33e4a9 7/10/2008 4:33:10 PM
 4  10 ec1c2078-1765-4377-9126-6f26fe33e4a9 7/10/2008 4:33:13 PM
 5  1 ec1c2078-1765-4377-9126-6f26fe33e4a9 7/10/2008 4:33:13 PM
 6  0 a3c3b9a0-a23b-4dda-b4e4-e82f0209c94d 7/10/2008 4:33:13 PM
 7  10 a3c3b9a0-a23b-4dda-b4e4-e82f0209c94d 7/10/2008 4:33:15 PM
 8  1 a3c3b9a0-a23b-4dda-b4e4-e82f0209c94d 7/10/2008 4:33:15 PM
 9  0 d23f393d-0272-445a-8670-3f71b016174e 7/10/2008 4:33:15 PM
10  10 d23f393d-0272-445a-8670-3f71b016174e 7/10/2008 4:33:17 PM
11  1 d23f393d-0272-445a-8670-3f71b016174e 7/10/2008 4:33:17 PM


EvId   Description
----   ----------------
  0 New Call
  1 End of Call
  2 Caller Hangup
 10 CPA Completed
+1  A: 

Here is a query that produces the log of all events in a given period, along with the current count of calls when the event occured. It uses several CTEs to build up the needed data in a logical sequence of steps: select calls started before start time, substract calls ended before start time, add call events between start and end time. This result set is then used to produce the timeline of events, along with a count of current calls at any event. The CTEs are used simply because I find them much more easier to read and understand than derived tables.

declare @temp table (
    EvId int not null
    , CallId uniqueidentifier not null
    , DateTime Datetime not null);

 declare @starttime datetime
    , @endtime datetime;

 select @starttime = '7/10/2008 1:33:14 PM';
 select @endtime = '7/10/2008 1:43:14 PM';

 -- These are all the calls
 -- that started before the start time
 with started_call as (
 select * from call_log 
    where DateTime < @starttime 
    and EvId = 0)
-- These are all the calls 
-- that ended    before the start time
 , ended_call as (
 select * from call_log 
    where DateTime < @starttime 
    and EvId = 1)
-- These are all the call ids 
-- that were ongoing at the start time  
 , existing_calls as (
 select CallId from started_call
 except
 select CallId from ended_call)
-- These are all the call events logged
-- for calls that were were ongoing at the start time   
 , existing_details as (
 select l.* 
    from call_log l
    join existing_calls e on e.CallId = l.CallId
    where l.DateTime < @starttime)
-- these are events that occured
-- between start time and endtime   
, new_events as (
    select * from call_log
    where DateTime between @starttime and @endtime)
-- and these are all the events that are of interest
, all_events as (
    select * from existing_details
    union all
    select * from new_events)
-- put all the interesting events into a @temp table
-- unfortunately QO cannot spool this for us
-- so we better do it isntead   
insert into @temp (EvId, CallId, DateTime)
    select EvId, CallId, DateTime  from all_events;

-- Extract events, along with the count
-- at the time of the event
select e.*,(
     select sum(case
      when EvId = 0 then 1 -- Start call
      when EvId = 1 then -1 -- end call
      else 0 end) -- Other events 
     from @temp se
     where se.DateTime < e.DateTime) as cnt
from @temp e
where DateTime between @starttime and @endtime
order by DateTime;

This query produces a plan that does not scan the entire log table, given proper indexes exists in place. It gives correct results for any interval, tacking into account existing calls at the interval start time. In my testing on 1 mil log records it consistently produced events for 10 minutes intervals in 1.1s (628ms to produce the @temp table, 505ms to produce the timeline with current count) on a 1.5GB RAM single proc laptop. Performance on large tables can be improved if a restriction on the maximum duration of any call is introduced, because the search for existing calls at start time can then be bounded on the lower end (DatTime >= start time - max duration of a call).

The use of an intermediate @temp table variable is no elegant, but is efficient.

Here is a sample output:

EvId    CallId                                 DateTime               cnt
1   401D9E00-040C-4B0E-8864-C66B72CF47AA 2008-07-10 13:33:16.000 23
10  401D9E00-040C-4B0E-8864-C66B72CF47AA 2008-07-10 13:33:16.000 23
1   8BF7AF50-B32C-464A-AF01-FDB653F0517D 2008-07-10 13:33:18.000 22
10  8BF7AF50-B32C-464A-AF01-FDB653F0517D 2008-07-10 13:33:18.000 22
0   CB523E24-5CE2-4E36-9D6C-4AE7BCEB1F53 2008-07-10 13:33:19.000 21
1   4A54EEB6-A899-4167-9D5C-2CE1BC838FFB 2008-07-10 13:33:20.000 22

Here is how I created and loaded the test data. Note the clustered index and nonclustered index on the table, they are both critical.

create table call_log (id int identity(1,1) not null
    , EvId int not null
    , CallId uniqueidentifier not null
    , DateTime Datetime not null);
create clustered index cdx_call_log on call_log(EvId, DateTime);
create nonclustered index idx_call_log_call_id on call_log(CallId);
go

 set nocount on;
 declare @i int, @date datetime, @callId uniqueidentifier;
 select @i = 0, @date = '7/10/2008 12:33:14 PM';
 begin transaction
 while @i < 1000000
 begin
    declare @duration int,
     @delay int;
    select @duration = rand()*180,
     @delay = rand() * 10;
    select @date = dateadd(second, @delay, @date)
     , @callId = newid();

    insert into call_log (EvId, CallId, DateTime)
    values  (0, @callId, @date)
     , (10, @callId, dateadd(second, @duration, @date))
     , (1, @callId, dateadd(second, @duration, @date));
    select @i = @i + 1;
    if (0 = @i%100)
    begin
     commit;
     begin tran;
    end
 end
 commit
 go
Remus Rusanu
I like the solutions there. Now my question is narrowed to: is this a better solution that simply grabbing the data and tossing it into C# and performing a linear analysis?
Joe Philllips
Usually yes. If the data has any significant size, the cost of 'grabbing and tossing' it into C# will be overwhelming.
Remus Rusanu
I must add that each record does not have a start and end time. The start event is a single record and the end event is another record.
Joe Philllips
+1  A: 

Before you use my query examples, you need to set up a "helper" table, you only need to do this one time per database:

CREATE TABLE Numbers
(Number int  NOT NULL,
    CONSTRAINT PK_Numbers PRIMARY KEY CLUSTERED (Number ASC)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
) ON [PRIMARY]
DECLARE @x int
SET @x=0
WHILE @x<8000
BEGIN
    SET @x=@x+1
    INSERT INTO Numbers VALUES (@x)
END

This basically creates a table containing a single column containing values from 1 to 8000. You can use a CTE to do the same, but since you son't say the SQL Server version, this will work for all, and is better if you will run this many times.

try this:

DECLARE @Calls  table (rowID int not null primary key identity(1,1)
                      ,EvId int not null
                      ,CallId varchar(36)
                      ,rowDateTime datetime
                      )
SET NOCOUNT ON
INSERT INTO @Calls VALUES ( 0,'df1cbc93-5cf3-402a-940b-4441f6a7ec5c',' 7/9/2008 8:12:56 PM')
INSERT INTO @Calls VALUES ( 1,'df1cbc93-5cf3-402a-940b-4441f6a7ec5c',' 7/9/2008 8:13:07 PM')
INSERT INTO @Calls VALUES ( 0,'ec1c2078-1765-4377-9126-6f26fe33e4a9','7/10/2008 4:33:10 PM')
INSERT INTO @Calls VALUES (10,'ec1c2078-1765-4377-9126-6f26fe33e4a9','7/10/2008 4:33:13 PM')
INSERT INTO @Calls VALUES ( 1,'ec1c2078-1765-4377-9126-6f26fe33e4a9','7/10/2008 4:33:13 PM')
INSERT INTO @Calls VALUES ( 0,'a3c3b9a0-a23b-4dda-b4e4-e82f0209c94d','7/10/2008 4:33:13 PM')
INSERT INTO @Calls VALUES (10,'a3c3b9a0-a23b-4dda-b4e4-e82f0209c94d','7/10/2008 4:33:15 PM')
INSERT INTO @Calls VALUES ( 1,'a3c3b9a0-a23b-4dda-b4e4-e82f0209c94d','7/10/2008 4:33:15 PM')
INSERT INTO @Calls VALUES ( 0,'d23f393d-0272-445a-8670-3f71b016174e','7/10/2008 4:33:15 PM')
INSERT INTO @Calls VALUES (10,'d23f393d-0272-445a-8670-3f71b016174e','7/10/2008 4:33:17 PM')
INSERT INTO @Calls VALUES ( 1,'d23f393d-0272-445a-8670-3f71b016174e','7/10/2008 4:33:17 PM')
--I added more test data, to hit more cases
INSERT INTO @Calls VALUES ( 0,'111111111111111111111111111111111111','7/10/2008 4:10:00 PM')
INSERT INTO @Calls VALUES (10,'111111111111111111111111111111111111','7/10/2008 4:11:00 PM')
INSERT INTO @Calls VALUES ( 1,'111111111111111111111111111111111111','7/10/2008 4:11:00 PM')
INSERT INTO @Calls VALUES ( 0,'222222222222222222222222222222222222','7/10/2008 4:15:00 PM')
INSERT INTO @Calls VALUES (10,'222222222222222222222222222222222222','7/10/2008 4:16:00 PM')
INSERT INTO @Calls VALUES ( 1,'222222222222222222222222222222222222','7/10/2008 4:16:00 PM')
INSERT INTO @Calls VALUES ( 0,'333333333333333333333333333333333333','7/10/2008 4:09:00 PM')
INSERT INTO @Calls VALUES (10,'333333333333333333333333333333333333','7/10/2008 4:18:00 PM')
INSERT INTO @Calls VALUES ( 1,'333333333333333333333333333333333333','7/10/2008 4:18:00 PM')
INSERT INTO @Calls VALUES ( 0,'444444444444444444444444444444444444','7/10/2008 4:13:00 PM')
INSERT INTO @Calls VALUES (10,'444444444444444444444444444444444444','7/10/2008 4:14:00 PM')
INSERT INTO @Calls VALUES ( 1,'444444444444444444444444444444444444','7/10/2008 4:14:00 PM')
INSERT INTO @Calls VALUES ( 0,'555555555555555555555555555555555555','7/10/2008 4:13:00 PM')
SET NOCOUNT OFF

DECLARE @StartRange  datetime
DECLARE @EndRange    datetime

SET @StartRange='7/10/2008 4:12:00 PM'
SET @EndRange  ='7/10/2008 4:15:00 PM'

SET @EndRange=DATEADD(mi,1,@EndRange)

--this lists the match time and each calls details in progress at that time
SELECT
    DATEADD(mi,n.Number-1,c.StartTime) AS 'TimeOfMatch'
        ,c.CallID
        ,c.StartTime,c.EndTime
    FROM (SELECT --this derived table joins together the start and end dates into a single row, filtering out rows more than 90 minutes before the start range (if calls are longer than 90 minutes, increase this) and filters out any rows after the end date (will consider call done at end date then)
              CallID, MIN(rowDateTime) AS StartTime, CASE  WHEN MAX(rowDateTime)=MIN(rowDateTime) THEN @EndRange ELSE MAX(rowDateTime) END  AS EndTime
              FROM @Calls 
              WHERE rowDateTime>=DATEADD(mi,-90,@StartRange) --AND rowDateTime<=@EndRange
              GROUP BY CallID
         ) c
        INNER JOIN Numbers   n ON DATEDIFF(mi,c.StartTime,c.EndTime)+1>=n.Number
    WHERE DATEADD(mi,n.Number-1,c.StartTime)>=@StartRange AND DATEADD(mi,n.Number-1,c.StartTime)<@EndRange
    ORDER BY 1

--this lists just the match time and the call count
SELECT
    DATEADD(mi,n.Number-1,c.StartTime) AS 'TimeOfMatch'
        ,c.CallID
        ,c.StartTime,c.EndTime
    FROM (SELECT --this derived table joins together the start and end dates into a single row, filtering out rows more than 90 minutes before the start range (if calls are longer than 90 minutes, increase this) and filters out any rows after the end date (will consider call done at end date then)
              CallID, MIN(rowDateTime) AS StartTime, CASE  WHEN MAX(rowDateTime)=MIN(rowDateTime) THEN @EndRange ELSE MAX(rowDateTime) END  AS EndTime
              FROM @Calls 
              WHERE rowDateTime>=DATEADD(mi,-90,@StartRange) --AND rowDateTime<=@EndRange
              GROUP BY CallID
         ) c
        INNER JOIN Numbers   n ON DATEDIFF(mi,c.StartTime,c.EndTime)+1>=n.Number
    WHERE DATEADD(mi,n.Number-1,c.StartTime)>=@StartRange AND DATEADD(mi,n.Number-1,c.StartTime)<@EndRange
    ORDER BY 1

here's the output:

TimeOfMatch             CallID                               StartTime               EndTime
----------------------- ------------------------------------ ----------------------- -----------------------
2008-07-10 16:12:00.000 333333333333333333333333333333333333 2008-07-10 16:09:00.000 2008-07-10 16:18:00.000
2008-07-10 16:13:00.000 333333333333333333333333333333333333 2008-07-10 16:09:00.000 2008-07-10 16:18:00.000
2008-07-10 16:13:00.000 444444444444444444444444444444444444 2008-07-10 16:13:00.000 2008-07-10 16:14:00.000
2008-07-10 16:13:00.000 555555555555555555555555555555555555 2008-07-10 16:13:00.000 2008-07-10 16:16:00.000
2008-07-10 16:14:00.000 555555555555555555555555555555555555 2008-07-10 16:13:00.000 2008-07-10 16:16:00.000
2008-07-10 16:14:00.000 444444444444444444444444444444444444 2008-07-10 16:13:00.000 2008-07-10 16:14:00.000
2008-07-10 16:14:00.000 333333333333333333333333333333333333 2008-07-10 16:09:00.000 2008-07-10 16:18:00.000
2008-07-10 16:15:00.000 333333333333333333333333333333333333 2008-07-10 16:09:00.000 2008-07-10 16:18:00.000
2008-07-10 16:15:00.000 555555555555555555555555555555555555 2008-07-10 16:13:00.000 2008-07-10 16:16:00.000
2008-07-10 16:15:00.000 222222222222222222222222222222222222 2008-07-10 16:15:00.000 2008-07-10 16:16:00.000

(10 row(s) affected)

TimeOfMatch             
----------------------- -----------
2008-07-10 16:12:00.000 1
2008-07-10 16:13:00.000 3
2008-07-10 16:14:00.000 3
2008-07-10 16:15:00.000 3

(4 row(s) affected)

You will need a composite index on rowDateTime+CallId. However for the best performance, if you created a new table (clustered index on startdate+CallId) that contained both the start and end dates of a single call (possibly using a trigger when the EvId=0 insert with start date, and when EvId=1 update end date) then the derived table could be removed with this new table.

KM
is this what you need, or are you after something else? if so let me know...
KM
Testing
Joe Philllips
I think the only issue is that I am trying to graph these counts (calls/time) so I need to do an iteration of some sort to get the count for every second. I would like to specify the time range at of which it should be graphed (ie. all calls between 12:00 and 1:00 on a certain day)
Joe Philllips
The only other issue is that this table has 15million records so any optimizations would be an improvement
Joe Philllips
so you want the count of every second for how long of a range? all 15 million or a day? can I ignore all calls that have not completed yet?
KM
I would probably be looking for just a couple hours of time. If I could specify the time range that would be best. Within that range I would like to know, for every second (or every 5 seconds), how many calls are currently active.
Joe Philllips
can calls still in progress be ignored?
KM
Yes, not a problem...
Joe Philllips
Nice work! SQL Server 2000
Joe Philllips
A: 

Try this :

DECLARE @tblCalls TABLE(ActionEffect int, ActionTime datetime)

INSERT INTO @tblCalls(ActionEffect, ActionTime)
    SELECT 1, [DateTime]
    FROM tblCallRecords
    WHERE EviD = 0

INSERT INTO @tblCalls(ActionEffect, ActionTime)
    SELECT -1, [DateTime]
    FROM tblCallRecords
    WHERE EvID > 0

(I'm assuming that EvID's other than 0 indicate the end of a call?)

Then, to get the number of calls at any given moment, you do :

SELECT Sum(ActionEffect)
FROM @tblCalls
WHERE ActionTime < @GivenMoment

Not too nice with 15 million records, though.

Now, if you wanted a running total of this, you're probably going to need to do something like this :

SELECT a.ActionTime, Sum(b.ActionEffect) AS OpenCalls
FROM @tblCalls AS a
LEFT JOIN @tblCalls AS b ON a.ActionTime > b.ActionTime
GROUP BY a.ActionTime

which gets huge quickly. I think I'd run it once, store the results in a table and modify my call-recording mechanism code to update it on the fly as calls come in.

CodeByMoonlight
A: 

This is not a solution, but just throwing some ideas out there. Haven't tested this, so feel free to shoot them down if there are rubbish.

This sort of assumes two things

1) There is an index on DateTime and UniqueID

2) That a call will not last longer than a certain length of time (say 24 hours or 48 hours) or can be ignored if it does so.

If not, then you may stop reading.

If yes, if you start with a query something like

 Select CallId, 
     Min(DateTime) as StartOfCall , Max(DateTime) as EndofCall        
 from Call_log
 where
    (evid = 0 or evid=1)
 and DateTime between @ExtendedStartPeriod and @ExtendedEndPeriod

where ExtendedStartPeriod and ExtendedEndPeriod are a day before and a day after your actual period (or two days if your maximum call length is 48 hours)

This will give you some records you don't want, so you do an further query to remove these

Select UniqueID from (...) table1
where StartOfCall <= @EndDate or EndOfCall >= @StartDate

This should (I think) exclude the calls that start after your finish period or calls that end before the start date.

Next we perform another outer query

Select DateTime, 
  CallChange = Case 
  When Evid = 0 then 1
  When Evid = 1 then -1
  else 0
 end
 from call_log 
 where 
  unique_id in ( ... )  
  and (evid = 0 or evid=1)
 and DateTime between @ExtendedStartPeriod and @ExtendedEndPeriod

This should give you a list of times of events and whether they are increasing or decreasing the number of calls. In your example, something like

         7/9/2008 8:12:56 PM  1
         7/9/2008 8:13:07 PM -1
        7/10/2008 4:33:10 PM  1
        7/10/2008 4:33:13 PM -1
        7/10/2008 4:33:13 PM  1
        7/10/2008 4:33:15 PM -1
        7/10/2008 4:33:15 PM  1
        7/10/2008 4:33:17 PM -1

If there is an extremely high volume of calls per second it might help to group this by minute to reduce the size of the data returned from sql.

It might even by possible to do a further query

Select 
   Count(CallChange) ,
   DatePart("yyyy", DateTime) , 
   DatePart("mm", DateTime),
   DatePart("dd", DateTime),
   DatePart("hh", DateTime),
   DatePart("mi", DateTime)
   DatePart("ss", DateTime)
From
   ( ...) 

  Group By
     DatePart("yyyy", DateTime) , 
     DatePart("mm", DateTime),
     DatePart("dd", DateTime),
     DatePart("hh", DateTime),
     DatePart("mi", DateTime)
     DatePart("ss", DateTime)

That's about as as I can go with Sql, maybe someone can take it further, otherwise I think would need to do a bit of C# to keep a running count of the transactions per period.

sgmoore