tags:

views:

504

answers:

5

We are building a query to count the number of events per hour, per day. Most days there are hours that do not have any activity and therefore where the query is run the count of activities per hour show up but there are gaps and the query excludes these. We still want to show the hours that do not have activity and display a zero so that zero value can then be charted. The query we using looks like this …

select datepart(Year, dev_time) as Year,
    datepart(Month, dev_time) as Month,
    datepart(Day, dev_time) as Day,
    datepart(Hour, dev_time) as Hour,
    count(tdm_msg) as Total_ACTIVITES
from TCKT_ACT
where tdm_msg = ‘4162′ and dev_time >= DATEADD(day, - 1, GETDATE())
group by datepart(Year, dev_time) ,
    datepart(Month, dev_time) ,
    datepart(Day, dev_time),
    datepart(Hour, dev_time)
order by datepart(Year, dev_time) asc,
    datepart(Month, dev_time) asc,
    datepart(Day, dev_time) asc,
    datepart(Hour, dev_time) asc
+2  A: 

You are going to somehow need a table of days and hours, and then you will have to do an outer join between that table and your query. Here's how I would do it. Note that this solution will only work in SQL Server 2005 and 2008. If you don't have those platforms, you'll have to actually create a table of times in your database from which you can join off of:

DECLARE @MinDate DATETIME;
SET @MinDate =  CONVERT(varchar, GETDATE(), 101);

WITH times AS (
    SELECT @MinDate as dt, 1 as depth
    UNION ALL
    SELECT DATEADD(hh, depth, @MinDate), 1 + depth as depth
    FROM times
    WHERE DATEADD(hh, depth, @MinDate) <= GETDATE())
SELECT DATEPART(YEAR, t.dt) as [Year],
    DATEPART(MONTH, t.dt) as [Month],
    DATEPART(DAY, t.dt) as [Day],
    DATEPART(HOUR, t.dt) as [Hour],
    COUNT(tdm_msg) as Total_ACTIVITES
FROM times t
LEFT JOIN (SELECT * FROM TCKT_ACT WHERE tdm_msg = '4162' and dev_time >= @MinDate) a
    ON  DATEPART(HOUR, t.dt)  = DATEPART(HOUR, a.dev_time)
    AND MONTH(t.dt) = MONTH(a.dev_time)
    AND DAY(t.dt)   = DAY(a.dev_time)
    AND YEAR(t.dt)  = YEAR(a.dev_time)
GROUP BY DATEPART(YEAR, t.dt) ,
    DATEPART(MONTH, t.dt) ,
    DATEPART(DAY, t.dt),
    DATEPART(HOUR, t.dt)
ORDER BY DATEPART(YEAR, t.dt) asc,
    DATEPART(MONTH, t.dt) asc,
    DATEPART(DAY, t.dt) asc,
    DATEPART(HOUR, t.dt) asc
OPTION (MAXRECURSION 0); /* Just in case you want a longer timespan later on... */

Note that the WITH statement at the top is called a recursive common table expression, and is a good way of generating sequential tables with relatively small numbers of elements, like you have here.

Dave Markle
+2  A: 

We had a similar problem with some performance monitoring software but, being in a DB2/z mainframe shop, we're dead set against having to do SQL gymnastics to get those sort of results. SQL queries that perform 'functions' on every row they retrieve are notoriously unscalable and the DBAs would have a field day laughing at us if we tried to use them.

Instead, we found it easier to refactor the database schema to include a count of events in each row (apparently our DBAs don't mind using more disk space, just more CPU grunt). In your case, that would be adding a column called tdm_quant which you would set to 1 for every row that you insert (i.e., each event).

Then the fifth field of your query changes from count(tdm_msg) to sum(tdm_quant) which will achieve the same result.

In addition to that you can insert a special record (once an hour, or 24 of them at the start of each day, or populate the entire years worth on January 1 if you wish) where the tdm_quant field is set to zero. Being zero, these records will have no effect on the sum(tdm_quant) but you will get your desired behaviour, a row returned for every hour of the day which will have zero as Total_ACTIVITIES where no events occurred in that hour.

The rest of your query will not need to change.

paxdiablo
Interesting circumvention...
Jonathan Leffler
It was considered neat - for a one-time cost of expanding the table, a once a year job adding 8860 (365*24) rows in a table that holds 500,000+ 500-octet rows (so extra disk not really relevant either, for us), we get MUCH faster queries and reports and they're done many dozens of times a day.
paxdiablo
And the best part, my bonus was upped for innovative thinking, and I got an award from the DBAs for making their lives easier. Well, best for me, anyway :-)
paxdiablo
A: 

It sounds like you could use a "left outer join" using another table that has the numbers 1 through 24 in it...

I think it would take at least a combination of date and time - though granted the times would be 'on the hour'. And generating a sufficient set of values in the right range is not trivial in the average DBMS.
Jonathan Leffler
A: 

The basic answer here involves a left outer join (LOJ), and an explicit COUNT(column) since that does not count nulls but COUNT(*) counts all rows. The hard part is generating a table against which to do the LOJ. The WITH clause and recursive solution will work in a number of DBMS (MS SQL Server, apparently, and almost certainly DB2 -- probably others too).

Many DBMS support temporary tables and stored procedures; the combination could be used to populate a table with an appropriate set of values for the date/time field, and then do the LOJ against that table (or, more precisely, FROM temp_table LEFT OUTER JOIN main_table ...). Not as neat and tidy, but works most places.

Jonathan Leffler
+1  A: 

First I created a table function based on the recursive common table query described by Dave Markle (thanks for showing me this Dave!). This is extremely sweet because I only have to make the function once and I can use it for analysing any intervals.

if exists (select * from dbo.sysobjects where name = 'fn_daterange') drop function fn_daterange;
go

create function fn_daterange
   (
   @MinDate as datetime,
   @MaxDate as datetime,
   @intval  as datetime
   )
returns table
--**************************************************************************
-- Procedure: fn_daterange()
--    Author: Ron Savage
--      Date: 12/16/2008
--
-- Description:
-- This function takes a starting and ending date and an interval, then
-- returns a table of all the dates in that range at the specified interval.
--
-- Change History:
-- Date        Init. Description
-- 12/16/2008  RS    Created.
-- **************************************************************************
as
return
   WITH times (startdate, enddate, intervl) AS
      (
      SELECT @MinDate as startdate, @MinDate + @intval - .0000001 as enddate, @intval as intervl
         UNION ALL
      SELECT startdate + intervl as startdate, enddate + intervl as enddate, intervl as intervl
      FROM times
      WHERE startdate + intervl <= @MaxDate
      )
   select startdate, enddate from times;

go

So if you do a select from that function all by itself you get a table of time intervals like this:

fn_daterange('12/14/2008 10:00:00', '12/14/2008 20:00:00', '01:00:00' )

returns:

startdate               enddate                 intervl                 
----------------------- ----------------------- ----------------------- 
2008-12-14 10:00:00.000 2008-12-14 10:59:59.997 1900-01-01 01:00:00.000 
2008-12-14 11:00:00.000 2008-12-14 11:59:59.997 1900-01-01 01:00:00.000 
2008-12-14 12:00:00.000 2008-12-14 12:59:59.997 1900-01-01 01:00:00.000 
2008-12-14 13:00:00.000 2008-12-14 13:59:59.997 1900-01-01 01:00:00.000 
2008-12-14 14:00:00.000 2008-12-14 14:59:59.997 1900-01-01 01:00:00.000 
2008-12-14 15:00:00.000 2008-12-14 15:59:59.997 1900-01-01 01:00:00.000 
2008-12-14 16:00:00.000 2008-12-14 16:59:59.997 1900-01-01 01:00:00.000 
2008-12-14 17:00:00.000 2008-12-14 17:59:59.997 1900-01-01 01:00:00.000 
2008-12-14 18:00:00.000 2008-12-14 18:59:59.997 1900-01-01 01:00:00.000 
2008-12-14 19:00:00.000 2008-12-14 19:59:59.997 1900-01-01 01:00:00.000 
2008-12-14 20:00:00.000 2008-12-14 20:59:59.997 1900-01-01 01:00:00.000

Then I made a sample table of event data:

    eventdate               eventnote            
    ----------------------- -------------------- 
    2008-12-14 10:01:00.000 oo! an event!        
    2008-12-14 10:01:00.000 oo! an event!        
    2008-12-14 10:01:00.000 oo! an event!        
    2008-12-14 10:01:00.000 oo! an event!        
    2008-12-14 10:23:00.000 oo! an event!        
    2008-12-14 10:23:00.000 oo! an event!        
    2008-12-14 10:23:00.000 oo! an event!        
    2008-12-14 11:23:00.000 oo! an event!        
    2008-12-14 11:23:00.000 oo! an event!        
    2008-12-14 11:23:00.000 oo! an event!        
    2008-12-14 11:23:00.000 oo! an event!        
    2008-12-14 11:23:00.000 oo! an event!        
    2008-12-14 14:23:00.000 oo! an event!        
    2008-12-14 14:23:00.000 oo! an event!        
    2008-12-14 14:23:00.000 oo! an event!        
    2008-12-14 19:23:00.000 oo! an event!        
    2008-12-14 19:23:00.000 oo! an event!        
    2008-12-14 19:23:00.000 oo! an event!        
    2008-12-14 19:23:00.000 oo! an event!        
    2008-12-14 19:00:00.000 oo! an event!        
    2008-12-14 19:00:00.000 oo! an event!        
    2008-12-14 19:00:00.000 oo! an event!        

    22 Row(s) affected

Then I hooked them together with a LEFT OUTER JOIN like so:

select
   dr.startdate,
   dr.enddate,
   count(me.eventdate) as eventcount
from
   fn_daterange('12/14/2008 10:00:00', '12/14/2008 20:00:00', '01:00:00' ) dr

   LEFT OUTER JOIN myevents me
      on ( me.eventdate between dr.startdate and dr.enddate)
group by
   dr.startdate,
   dr.enddate


startdate               enddate                 eventcount 
----------------------- ----------------------- ---------- 
2008-12-14 10:00:00.000 2008-12-14 10:59:59.993 7          
2008-12-14 11:00:00.000 2008-12-14 11:59:59.993 5          
2008-12-14 12:00:00.000 2008-12-14 12:59:59.993 0          
2008-12-14 13:00:00.000 2008-12-14 13:59:59.993 0          
2008-12-14 14:00:00.000 2008-12-14 14:59:59.993 3          
2008-12-14 15:00:00.000 2008-12-14 15:59:59.993 0          
2008-12-14 16:00:00.000 2008-12-14 16:59:59.993 0          
2008-12-14 17:00:00.000 2008-12-14 17:59:59.993 0          
2008-12-14 18:00:00.000 2008-12-14 18:59:59.993 0          
2008-12-14 19:00:00.000 2008-12-14 19:59:59.993 7          
2008-12-14 20:00:00.000 2008-12-14 20:59:59.993 0          

11 Row(s) affected

HOLY CRAP that is sweet - I can use this for all kinds of analysis at work! :-)

Thanks Fred for the question and Dave for the info on common table queries!

Ron

Ron Savage