views:

412

answers:

4

I currently have a table (SQL Server 2005) that logs the visits against my web app, and I want to put together some code to report (and display a visualization) of that traffic. What I want is to display the number of visits during each ten-minute interval over the last 24 hours.

I have a query that does just that, but there are ten-minute intervals during which there are no visits, and I would like to adjust my query to display a zero count for those intervals. I imagine I could come up with something using cursors, but I'd rather not use them if I can avoid it.

Here is the query so far:

DECLARE @time int
DECLARE @interval int
SELECT @time=96
SELECT @interval=10
SELECT interval,
COUNT(*) AS requestCount,
DATEDIFF(MINUTE,DATEADD(HOUR,-1*@time-1,getDate()),interval)/@interval AS intervalPos
FROM 
(SELECT DATEADD(MINUTE, FLOOR(DATEDIFF(MINUTE, getDate(), requestBegin) / (@interval*1.0)) * @interval,getDate())
FROM [track_pageSubmit] WHERE requestBegin IS NOT NULL AND DATEDIFF(HOUR,requestBegin,getDate()) < @time) AS I(interval) 
GROUP BY interval ORDER BY interval

Here is the table structure:

CREATE TABLE [dbo].[Track_PageSubmit](
    [id] [int] IDENTITY(1,1) NOT NULL,
    [popid] [int] NOT NULL,
    [section] [varchar](30) NULL,
    [page] [int] NULL,
    [requestBegin] [datetime] NULL,
    [requestEnd] [datetime] NULL,
    [rendered] [datetime] NULL,
    [postBegin] [datetime] NULL,
    [postEnd] [datetime] NULL
)

And here is what some of the records look like:

INSERT INTO track_pageSubmit (popid,section,page,requestbegin,requestend,rendered,postbegin,postend)
SELECT '2393712','Main_Can_Eng','10','2010-01-22 10:22:08.287','2010-01-22 10:22:08.330',NULL,'2010-01-22 10:22:09.503','2010-01-22 10:22:09.627' UNION
SELECT '2393712','Main_Can_Eng','11','2010-01-22 10:22:09.660','2010-01-22 10:22:09.770',NULL,'2010-01-22 10:22:10.973','2010-01-22 10:22:11.050' UNION
SELECT '2393712','Main_Can_Eng','12','2010-01-22 10:22:11.080','2010-01-22 10:22:11.143',NULL,'2010-01-22 10:22:12.503','2010-01-22 10:22:12.567' UNION
SELECT '2394478','main','21','2010-01-21 10:38:54.057','2010-01-21 10:38:54.117','2010-01-21 10:38:54.487','2010-01-21 10:38:55.633','2010-01-21 10:38:55.697' UNION
SELECT '2394478','main','22','2010-01-21 10:38:55.757','2010-01-21 10:38:55.820','2010-01-21 10:38:56.197','2010-01-21 10:38:57.477','2010-01-21 10:38:57.570' UNION
SELECT '2394478','main','23','2010-01-21 10:38:57.617','2010-01-21 10:38:57.993','2010-01-21 10:38:58.367','2010-01-21 10:38:59.397','2010-01-21 10:38:59.493' UNION
SELECT '2394478','main','25','2010-01-21 10:38:59.553','2010-01-21 10:38:59.617','2010-01-21 10:38:59.993','2010-01-21 10:39:01.227','2010-01-21 10:39:01.303' UNION
SELECT '2394478','main','26','2010-01-21 10:39:01.350','2010-01-21 10:39:01.477','2010-01-21 10:39:01.860','2010-01-21 10:39:02.787','2010-01-21 10:39:02.867' UNION
SELECT '2394478','main','27','2010-01-21 10:39:02.930','2010-01-21 10:39:03.007','2010-01-21 10:39:03.400','2010-01-21 10:39:04.147','2010-01-21 10:39:04.460' UNION
SELECT '2394478','main','28','2010-01-21 10:39:04.507','2010-01-21 10:39:05.147','2010-01-21 10:39:05.790','2010-01-21 10:39:19.413','2010-01-21 10:39:19.477' UNION
SELECT '2393754','exp46_cex','1','2010-01-22 12:40:56.563','2010-01-22 12:40:56.640',NULL,'2010-01-22 12:40:58.657','2010-01-22 12:40:58.733' UNION
SELECT '2393754','exp46_cex','2','2010-01-22 12:40:58.750','2010-01-22 12:40:58.780',NULL,'2010-01-22 12:41:15.623','2010-01-22 12:41:15.657' UNION
SELECT '2393754','additionalComments','1','2010-01-22 12:41:15.670','2010-01-22 12:41:15.733',NULL,'2010-01-22 12:41:19.000','2010-01-22 12:41:19.030' UNION
SELECT '2393802','main','2','2010-01-22 12:44:50.857','2010-01-22 12:44:50.933',NULL,'2010-01-22 12:44:53.497','2010-01-22 12:44:53.557' UNION
SELECT '2393802','main','3','2010-01-22 12:44:53.590','2010-01-22 12:44:53.667',NULL,'2010-01-22 12:44:56.370','2010-01-22 12:44:56.730'

Bonus points (in the form of extra thanks from me) to anyone who can change the query so that I can also report distinct popids per interval (in addition to total requests).

Thanks!

+2  A: 

Easy way to do this is to use a CTE and make a helper table with the start time interval, then just join to that table in the main query using between.

Did that make sense? I'll work on some example code in a bit.

DECLARE @time int
DECLARE @interval int
SELECT @time=96
SELECT @interval=10

DECLARE @count int
SELECT @count=1

;WITH daterange AS
(
   SELECT 1 as [id], Max(requestbegin) as maxr, Min(requestBegin) as minr
   FROM track_pagesubmit
), intervals as
(
   SELECT @count AS interval, minr as intervalpos
   FROM daterange
   WHERE [id] = 1
   UNION ALL
   SELECT interval+1 AS interval,
          DATEADD(MINUTE,@interval,intervalpos) as intervalpos
   FROM intervals
   JOIN daterange on [ID] = 1
   WHERE DATEADD(MINUTE,@interval,intervalpos) < maxr
)
SELECT interval, intervalpos, 
       COUNT(DISTINCT track_pagesubmit.popid) as popcount,
       COUNT(track_pagesubmit.id AS requestcount 
FROM intervals
LEFT JOIN track_pagesubmit ON requestbegin IS NOT NULL 
  AND requestBegin BETWEEN intervalpos AND DATEADD(ns,-1,DATEADD(MINUTE,@interval,intervalpos))
GROUP BY interval, intervalpos 
OPTION (MAXRECURSION 200)
Hogan
You can't really use BETWEEN because if your data happens to fall on a boundary it will be included in two sets. IMHO should always use an open-ended range (>= begin and < end).
Aaron Bertrand
@Aaron: See above, between worked fine. (Of course I "cheat" and subtract 1 from @interval when I do the add.)
Hogan
Then you could miss data between x:59.003 and x:59.997. Why force yourself to do all of these math tricks when an open-ended range is more precise and less error-prone?
Aaron Bertrand
@Aaron: Ok, I changed it to subtract a nano-second from the end time. I'm not sure, I think BETWEEN is easier to read (and thus maintain) but I will grant you that in this use-case the range may be better.
Hogan
"Easier to read" is quite subjective. I think that when you have to start subtracting nanoseconds and worrying about rounding due to type conversion (what do you think happens to your rounding when you move to SMALLDATETIME?) the "benefits" of BETWEEN disappear quickly. This article is kind of long but this is one of the points I cover: http://sqlblog.com/blogs/aaron_bertrand/archive/2009/10/16/bad-habits-to-kick-mishandling-date-range-queries.aspx
Aaron Bertrand
I agree, subjective and in this case probably better your way, but I leave it as written for the context of other readers.
Hogan
A: 

The recursive CTE always seems to outperform the numbers table, but both methods are traditionally used for this.

Cade Roux
+1  A: 

-- Because your sample data spans 27 hours:

DECLARE
    @hours TINYINT,
    @minute_interval TINYINT,
    @start SMALLDATETIME;

SELECT
    @hours = 27,
    @minute_interval = 10,
    @start = '20100122 13:00';

;WITH x AS 
(
    SELECT TOP (@hours * (60 / @minute_interval))
        n = ROW_NUMBER() OVER
        (ORDER BY column_id)
    FROM msdb.sys.columns
),
intervals(boundary) AS
(
    SELECT CONVERT
    (
      SMALLDATETIME,
      DATEADD(MINUTE, (-n * @minute_interval), @start)
    )
    FROM x
)
SELECT
    i.boundary,
    RequestCount = COUNT(d.id),
    DistinctPopIDs = COUNT(DISTINCT d.popid)
FROM
    intervals AS i
LEFT OUTER JOIN
    dbo.Track_PageSubmit AS d
    ON d.requestBegin >= i.boundary
    AND d.requestBegin < DATEADD(MINUTE, @minute_interval, i.boundary)
GROUP BY i.boundary
ORDER BY i.boundary;
Aaron Bertrand
Sorry about that, I was just grabbing some representative data. I forgot to check the interval.
Jason Francis
This worked beautifully. Thank you.
Jason Francis
A: 

This is a bit rough, but should work for you:

WITH Nbrs_2( n ) AS ( SELECT 1 UNION SELECT 0 ),
    Nbrs_1( n ) AS ( SELECT 1 FROM Nbrs_2 n1 CROSS JOIN Nbrs_2 n2 ),
    Nbrs_0( n ) AS ( SELECT 1 FROM Nbrs_1 n1 CROSS JOIN Nbrs_1 n2 ),
    Nbrs  ( n ) AS ( SELECT 1 FROM Nbrs_0 n1 CROSS JOIN Nbrs_0 n2 )
select n.n as TenMinuteInterval, count(case when t.requestBegin is null then null else 1 end) as Count
from (
    SELECT n - 1 as n
    FROM (SELECT ROW_NUMBER()  OVER (ORDER BY n)
    FROM Nbrs) D ( n )
    WHERE n <= 144
) n
left outer join track_pageSubmit t on n.n = floor((cast(requestBegin - convert(int, requestBegin) as decimal(10,3)) % 1) * 144) 
    and requestBegin between '2010-01-21' and '2010-01-22'
group by n.n

Output:

TenMinuteInterval Count
-----------------------
0                  0
1                  0
2                  0
3                  0
4                  0
5                  0
6                  0
7                  0
8                  0
9                  0
10                 0
11                 0
12                 0
13                 0
14                 0
15                 0
16                 0
17                 0
18                 0
19                 0
20                 0
21                 0
22                 0
23                 0
24                 0
25                 0
26                 0
27                 0
28                 0
29                 0
30                 0
31                 0
32                 0
33                 0
34                 0
35                 0
36                 0
37                 0
38                 0
39                 0
40                 0
41                 0
42                 0
43                 0
44                 0
45                 0
46                 0
47                 0
48                 0
49                 0
50                 0
51                 0
52                 0
53                 0
54                 0
55                 0
56                 0
57                 0
58                 0
59                 0
60                 0
61                 0
62                 0
63                 7
64                 0
65                 0
66                 0
67                 0
68                 0
69                 0
70                 0
71                 0
72                 0
73                 0
74                 0
75                 0
76                 0
77                 0
78                 0
79                 0
80                 0
81                 0
82                 0
83                 0
84                 0
85                 0
86                 0
87                 0
88                 0
89                 0
90                 0
91                 0
92                 0
93                 0
94                 0
95                 0
96                 0
97                 0
98                 0
99                 0
100                0
101                0
102                0
103                0
104                0
105                0
106                0
107                0
108                0
109                0
110                0
111                0
112                0
113                0
114                0
115                0
116                0
117                0
118                0
119                0
120                0
121                0
122                0
123                0
124                0
125                0
126                0
127                0
128                0
129                0
130                0
131                0
132                0
133                0
134                0
135                0
136                0
137                0
138                0
139                0
140                0
141                0
142                0
143                0
RedFilter
Just ran this on the test data, got no results in count.
Hogan
...because the where clause (`getdate() - requestBegin <= 1.0`) is filtering for the last 24 hours (as you requested), whereas the test data is older than that...
RedFilter
I modified the `WHERE` clause to show data for 2010/01/21 - it returns count 7 for interval 63. Note intervals are zero-based in this query. If you want one-based, change `SELECT n - 1 as n` to `SELECT n as n`.
RedFilter
Added sample output
RedFilter