views:

51

answers:

1

Hi

I've got records that indicate when a job starts and when it ends. The End time is not recorded until the job ends, but the start time exists from the time the job starts. What I want is to know how many jobs were running in a given time period.

declare @data table (
    JobId INT IDENTITY(1,1),
    StartedAt DATETIME NOT NULL,
    EndedAt DATETIME NULL
)
insert into @data (StartedAt, EndedAt) 
select '1 Jan 2010 8:00', '1 Jan 2010 8:30'
union select '1 Jan 2010 8:00', '1 Jan 2010 9:00'
union select '1 Jan 2010 8:00', '1 Jan 2010 9:20'
union select '1 Jan 2010 9:00', '1 Jan 2010 9:20'
union select '1 Jan 2010 9:10', NULL

Given the above, how would I query the number of jobs running in each hour? I would expect the results to indicate that there were 3 jobs running in the 8:00 through 8:59 time period, and 4 running in the 9:00 through 9:59 period, as follows:

Time period     Jobs
08:00..08:59    3
09:00..09:59    4

I can modify the schema and I have some influence over how data is recorded if that would make the query simpler or more efficient.Blockquote

A: 

Without spending huge amounts of time on the problem, I'd suggest the simplest solution would be to create a table of days, and a table of hours (then you have a table of hours and days by creating a cartesian join) e.g.

days (ondate not null) slots (start_time time not null, end_time no_null)

then identifying the number of jobs which had a start or end time within those slots:

SELECT days.ondate, slots.start_time, slots.end_time, (
   SELECT COUNT(*) FROM @data
   WHERE (@data.start_time BETWEEN
       STR_TO_DATE(
          CONCATENATE(DATE_FORMAT(days.ondate, 'YYYYMMDD'),
             DATE_FORMAT(slots.start_time,'HH24MISS')), 
          'YYYYMMDDHH25MISS')
     AND
       STR_TO_DATE(
          CONCATENATE(DATE_FORMAT(days.ondate, 'YYYYMMDD'),
             DATE_FORMAT(slots.end_time,'HH24MISS')), 
          'YYYYMMDDHH25MISS')
  ) OR (@data.start_time BETWEEN
          CONCATENATE(DATE_FORMAT(days.ondate, 'YYYYMMDD'),
             DATE_FORMAT(slots.start_time,'HH24MISS')), 
          'YYYYMMDDHH25MISS')
     AND
       STR_TO_DATE(
          CONCATENATE(DATE_FORMAT(days.ondate, 'YYYYMMDD'),
             DATE_FORMAT(slots.end_time,'HH24MISS')), 
          'YYYYMMDDHH25MISS')
  ) AS jobs_running
  FROM days, slots
  ;

Of course this is going to get very inefficient very quickly - so you might want to think about pre-aggregation.

symcbean