views:

400

answers:

9

My SAMPLE table has the following five columns:

sample_id (PK) (NUMBER)
sampled_on (DATE)
received_on (DATE)
completed_on (DATE)
authorized_on (DATE)

I would like a query with one row per hour (constrained by a given date range) and five columns:

  1. The hour YYYY-MM-DD HH24
  2. Number of samples sampled during that hour
  3. Number of samples received during that hour
  4. Number of samples completed during that hour
  5. Number of samples authorized during that hour

Please provide a query or at least a point in the right direction.

Reopened with bounty:
  +300 reputation for the first person to incorporate Rob van Wijk's answer (single access to sample) into a view where I can efficiently query by date range (start_date/end_date or start_date/num_days).

A: 

I'd do are 4 queries like this (one for each date):

SELECT <date to hour>, count(*) FROM sample GROUP BY <date to hour>

And then put the data together in the application. If you really want a single query, you can join the individual queries on hour.

Lukáš Lalinský
Lukáš Lalinský: try for bounty.
Steven
+2  A: 

This may not be the prettiest or most optimal solution, but it seems to work. Explanation: first convert all the dates to YYYY-MM-DD HH24 format, next gather number sampled/received/completed/authorized by date+HH24, finally join together.

with sample_hour as
    (select sample_id, 
            to_char(sampled_on, 'YYYY-MM-DD HH24') sampled_on,
            to_char(received_on, 'YYYY-MM-DD HH24') received_on,
            to_char(completed_on, 'YYYY-MM-DD HH24') completed_on,
            to_char(authorized_on, 'YYYY-MM-DD HH24') authorized_on
     from sample),
s as 
    (select sampled_on thedate, count(*) num_sampled 
     from sample_hour 
     group by sampled_on),
r as 
    (select received_on thedate, count(*) num_received 
     from sample_hour 
     group by received_on),
c as 
    (select completed_on thedate, count(*) num_completed 
     from sample_hour 
     group by completed_on),
a as 
    (select authorized_on thedate, count(*) num_authorized 
     from sample_hour 
     group by authorized_on)
select s.thedate, num_sampled, num_received, num_completed, num_authorized 
from s 
left join r on s.thedate = r.thedate
left join c on s.thedate = c.thedate
left join a on s.thedate = a.thedate
;

This assumes a table "sample" created something like this:

create table sample
    (sample_id number not null primary key,
     sampled_on date,
     received_on date,
     completed_on date,
     authorized_on date);
shoover
shoover: try for bounty.
Steven
A: 

Try this...

WITH src_data AS
        ( SELECT sample_id
               , TRUNC( sampled_on, 'HH24' )   sampled_on
               , TRUNC( received_on, 'HH24' )   received_on
               , TRUNC( completed_on, 'HH24' )   completed_on
               , TRUNC( authorized_on, 'HH24' )   authorized_on
            FROM sample
        )
   , src_hours AS
        ( SELECT sampled_on   the_date 
            FROM src_data
           WHERE sampled_on IS NOT NULL
           UNION
          SELECT received_on   the_date
            FROM src_data
           WHERE received_on IS NOT NULL
           UNION
          SELECT completed_on   the_date
            FROM src_data
           WHERE completed_on IS NOT NULL
           UNION
          SELECT authorized_on   the_date
            FROM src_data
           WHERE authorized_on IS NOT NULL
        )
SELECT h.the_date
     , ( SELECT COUNT(*) 
           FROM src_data s
          WHERE s.sampled_on = h.the_date )   num_sampled_on
     , ( SELECT COUNT(*)
           FROM src_data r
          WHERE r.received_on = h.the_date )   num_received_on
     , ( SELECT COUNT(*)
           FROM src_data c
          WHERE c.completed_on = h.the_date )   num_completed_on
     , ( SELECT COUNT(*)
           FROM src_data a
          WHERE a.authorized_on = h.the_date )   num_authorized_on
  FROM src_hours h
Paul James
Paul James: try for bounty.
Steven
+2  A: 

Here is an example. First create the table and insert some random data.

SQL> create table sample
  2  ( sample_id     number primary key
  3  , sampled_on    date
  4  , received_on   date
  5  , completed_on  date
  6  , authorized_on date
  7  )
  8  /

Tabel is aangemaakt.

SQL> insert into sample
  2   select level
  3        , trunc(sysdate) + dbms_random.value(0,2)
  4        , trunc(sysdate) + dbms_random.value(0,2)
  5        , trunc(sysdate) + dbms_random.value(0,2)
  6        , trunc(sysdate) + dbms_random.value(0,2)
  7     from dual
  8  connect by level <= 1000
  9  /

1000 rijen zijn aangemaakt.

Then introduce the variables for your given date range and fill them.

SQL> var DATE_RANGE_START varchar2(10)
SQL> var DATE_RANGE_END varchar2(10)
SQL> exec :DATE_RANGE_START := '2009-10-23'

PL/SQL-procedure is geslaagd.

SQL> exec :DATE_RANGE_END := '2009-10-24'

PL/SQL-procedure is geslaagd.

First you'll have to generate all hours in your given date range. This makes sure that in case you have an hour where no dates are present, you'll still have a record with 4 zeros. The implementation is in the all_hours query. The rest of the query (with only one table access to your sample table!) can then be quite simple like this.

SQL> with all_hours as
  2  ( select to_date(:DATE_RANGE_START,'yyyy-mm-dd') + numtodsinterval(level-1,'hour') hour
  3      from dual
  4   connect by level <=
  5           (  to_date(:DATE_RANGE_END,'yyyy-mm-dd')
  6            - to_date(:DATE_RANGE_START,'yyyy-mm-dd')
  7            + 1
  8           ) * 24
  9  )
 10  select h.hour
 11       , count(case when h.hour = trunc(s.sampled_on,'hh24') then 1 end) sampled#
 12       , count(case when h.hour = trunc(s.received_on,'hh24') then 1 end) received#
 13       , count(case when h.hour = trunc(s.completed_on,'hh24') then 1 end) completed#
 14       , count(case when h.hour = trunc(s.authorized_on,'hh24') then 1 end) authorized#
 15    from all_hours h
 16         cross join sample s
 17   group by h.hour
 18  /

HOUR                  SAMPLED#  RECEIVED# COMPLETED# AUTHORIZED#
------------------- ---------- ---------- ---------- -----------
23-10-2009 00:00:00         18         25         20          20
23-10-2009 01:00:00         26         24         16          13
23-10-2009 02:00:00         16         26         17          15
23-10-2009 03:00:00         19         18         27          13
23-10-2009 04:00:00         28         20         18          23
23-10-2009 05:00:00         17         13         19          21
23-10-2009 06:00:00         18         23         16          15
23-10-2009 07:00:00         19         24         14          22
23-10-2009 08:00:00         21         19         23          22
23-10-2009 09:00:00         25         20         23          24
23-10-2009 10:00:00         16         21         25          18
23-10-2009 11:00:00         21         29         21          18
23-10-2009 12:00:00         33         28         24          20
23-10-2009 13:00:00         24         19         15          15
23-10-2009 14:00:00         20         27         16          25
23-10-2009 15:00:00         15         25         27          13
23-10-2009 16:00:00         19         14         27          18
23-10-2009 17:00:00         22         22         15          27
23-10-2009 18:00:00         20         19         29          23
23-10-2009 19:00:00         20         18         17          23
23-10-2009 20:00:00         11         18         20          27
23-10-2009 21:00:00         13         25         24          19
23-10-2009 22:00:00         22         13         22          29
23-10-2009 23:00:00         20         20         19          24
24-10-2009 00:00:00         18         17         18          29
24-10-2009 01:00:00         23         30         26          21
24-10-2009 02:00:00         28         19         28          25
24-10-2009 03:00:00         21         21         11          23
24-10-2009 04:00:00         23         20         21          17
24-10-2009 05:00:00         24         16         23          23
24-10-2009 06:00:00         23         26         22          30
24-10-2009 07:00:00         25         26         18          12
24-10-2009 08:00:00         24         20         23          17
24-10-2009 09:00:00         18         26         15          19
24-10-2009 10:00:00         20         19         25          18
24-10-2009 11:00:00         19         27         17          20
24-10-2009 12:00:00         23         16         18          20
24-10-2009 13:00:00         15         15         22          19
24-10-2009 14:00:00         23         23         16          29
24-10-2009 15:00:00         18         31         32          28
24-10-2009 16:00:00         22         15         18          13
24-10-2009 17:00:00         25         17         20          26
24-10-2009 18:00:00         19         20         21          16
24-10-2009 19:00:00         22         13         28          29
24-10-2009 20:00:00         23         17         23          14
24-10-2009 21:00:00         18         18         21          22
24-10-2009 22:00:00         22         20         18          21
24-10-2009 23:00:00         21         18         22          22

48 rijen zijn geselecteerd.

Hope this helps.

Regards, Rob.

Rob van Wijk
Incorporate this answer into a view that can efficiently filter by any date range for 300 reputation.
Steven
A: 

Maybe somthing like creating this view:

create view hours as
select hour, max(cnt_sample) cnt_sample, max(cnt_received) cnt_received, max(cnt_completed) cnt_completed, max(cnt_authorized) cnt_authorized
  from (
    select to_char(sampled_on   , 'yyyymmddhh24') hour, 
           count(sample_id) over (partition by to_char(sampled_on    ,'yyyymmddhh24')) cnt_sample,    
           0                                                                           cnt_received,
           0                                                                           cnt_completed, 
           0                                                                           cnt_authorized from sample union all
    select to_char(received_on  , 'yyyymmddhh24') hour, 
           0                                                                           cnt_sample, 
           count(sample_id) over (partition by to_char(received_on   ,'yyyymmddhh24')) cnt_received, 
           0                                                                           cnt_completed, 
           0                                                                           cnt_authorized from sample union all
    select to_char(completed_on , 'yyyymmddhh24') hour, 
           0                                                                           cnt_sample, 
           0                                                                           cnt_received, 
           count(sample_id) over (partition by to_char(completed_on  ,'yyyymmddhh24')) cnt_completed,
           0                                                                           cnt_authorized from sample union all
    select to_char(authorized_on, 'yyyymmddhh24') hour,
           0                                                                           cnt_sample, 
           0                                                                           cnt_received,
           0                                                                           cnt_completed, 
           count(sample_id) over (partition by to_char(authorized_on ,'yyyymmddhh24')) cnt_authorized from sample
  )
group by hour
;

and then selecting from the view:

select * from hours where hour >= '2001010102' and hour <= '2001010105'
order by hour;
René Nyffenegger
I'm looking for an answer that accesses the sample table only once, which I would think would be much more efficient.
Steven
A: 

Here's what I'm thinking, but I'm not sure it's optimal enough for a view.

select 
    the_date,
    sum(decode(the_type,'S',the_count,0)) samples,
    sum(decode(the_type,'R',the_count,0)) receipts,
    sum(decode(the_type,'C',the_count,0)) completions,
    sum(decode(the_type,'A',the_count,0)) authorizations
from(
    select 
     trunc(sampled_on,'HH24') the_date,
     'S' the_type,
     count(1) the_count
    FROM sample
    group by trunc(sampled_on,'HH24')
    union all 
    select 
     trunc(received_on,'HH24'),
     'R',
     count(1)
    FROM sample
    group by trunc(received_on,'HH24')
    union all
    select 
     trunc(completed_on,'HH24'),
     'C',
     count(1)
    FROM sample
    group by trunc(completed_on,'HH24')
    union all
    select 
     trunc(authorized_on,'HH24'),
     'A',
     count(1)
    FROM sample
    group by trunc(authorized_on,'HH24')
)
group by the_date

Then, to query, you could just query with normal date contructs:

select * from magic_view where the_date > sysdate-1;

EDIT

Okay, so I created a sample table and did some metrics:

create table sample ( 
  sample_id     number primary key, 
  sampled_on    date,
  received_on   date,
  completed_on  date,
  authorized_on date
);

insert into sample (
  select 
    level,
    trunc(sysdate) + dbms_random.value(0,2),
    trunc(sysdate) + dbms_random.value(0,2),
    trunc(sysdate) + dbms_random.value(0,2),
    trunc(sysdate) + dbms_random.value(0,2),
  from dual
  connect by level <= 1000
);

The explain plan is:

---------------------------------------------------------------------
| Id  | Operation             | Name   | Rows  | Bytes | Cost (%CPU)|
---------------------------------------------------------------------
|   0 | SELECT STATEMENT      |        |  4000 |    97K|    25  (20)|
|   1 |  HASH GROUP BY        |        |  4000 |    97K|    25  (20)|
|   2 |   VIEW                |        |  4000 |    97K|    24  (17)|
|   3 |    UNION-ALL          |        |       |       |            |
|   4 |     HASH GROUP BY     |        |  1000 |  9000 |     6  (17)|
|   5 |      TABLE ACCESS FULL| SAMPLE |  1000 |  9000 |     5   (0)|
|   6 |     HASH GROUP BY     |        |  1000 |  9000 |     6  (17)|
|   7 |      TABLE ACCESS FULL| SAMPLE |  1000 |  9000 |     5   (0)|
|   8 |     HASH GROUP BY     |        |  1000 |  9000 |     6  (17)|
|   9 |      TABLE ACCESS FULL| SAMPLE |  1000 |  9000 |     5   (0)|
|  10 |     HASH GROUP BY     |        |  1000 |  9000 |     6  (17)|
|  11 |      TABLE ACCESS FULL| SAMPLE |  1000 |  9000 |     5   (0)|
---------------------------------------------------------------------

On my machine, the a query against this view for the past 24 hours completes in 23ms. Not bad, but it's only 1,000 rows. Before you discount the 4 separate queries, you'll need to do performance analysis of the individual solutions.

Josh Bush
Sorry, I'm looking for an answer that accesses the sample table only once, which I would think would be much more efficient.
Steven
A: 

I now propose:

create view hours_ as 
    with four as (   
      select 1 as n from dual union all    
      select 2 as n from dual union all    
      select 3 as n from dual union all    
      select 4 as n from dual ) 
 select   
      case when four.n = 1 then trunc(sampled_on   , 'hh24')    
           when four.n = 2 then trunc(received_on  , 'hh24')
           when four.n = 3 then trunc(completed_on , 'hh24')
           when four.n = 4 then trunc(authorized_on, 'hh24')   
       end                                                     hour_, 
      sum (   case when four.n = 1 then 1 
               else                     0
               end )                                           sample_,   
      sum (   case when four.n = 2 then 1 
               else                     0
               end )                                           receive_,   
      sum (   case when four.n = 3 then 1 
               else                     0
               end )                                           complete_,   
      sum (   case when four.n = 4 then 1 
               else                     0
               end )                                           authorize_ 
from   
    four cross join sample 
group by    
      case when four.n = 1 then trunc(sampled_on   , 'hh24')   
           when four.n = 2 then trunc(received_on  , 'hh24')
           when four.n = 3 then trunc(completed_on , 'hh24')
           when four.n = 4 then trunc(authorized_on, 'hh24')   
    end ;

In order to see if the view is indeed accessed only once:

explain plan for select * from hours_ 
where hour_ between sysdate -1 and sysdate;

select * from table (dbms_xplan.display);

Which results in:

--------------------------------------------------------------------------------
| Id  | Operation             | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |        |     1 |    61 |    16   (7)| 00:00:01 |
|   1 |  VIEW                 | HOURS_ |     1 |    61 |    16   (7)| 00:00:01 |
|   2 |   HASH GROUP BY       |        |     1 |    39 |    16   (7)| 00:00:01 |
|*  3 |    FILTER             |        |       |       |            |          |
|   4 |     NESTED LOOPS      |        |     1 |    39 |    15   (0)| 00:00:01 |
|   5 |      VIEW             |        |     4 |    12 |     8   (0)| 00:00:01 |
|   6 |       UNION-ALL       |        |       |       |            |          |
|   7 |        FAST DUAL      |        |     1 |       |     2   (0)| 00:00:01 |
|   8 |        FAST DUAL      |        |     1 |       |     2   (0)| 00:00:01 |
|   9 |        FAST DUAL      |        |     1 |       |     2   (0)| 00:00:01 |
|  10 |        FAST DUAL      |        |     1 |       |     2   (0)| 00:00:01 |
|* 11 |      TABLE ACCESS FULL| SAMPLE |     1 |    36 |     2   (0)| 00:00:01 |
--------------------------------------------------------------------------------
René Nyffenegger
I've just seen that the execution plan is better if a hint is used in the select statement: select /*+ ordered */and the order of the tables is swapped: sample cross join four
René Nyffenegger
A: 

Similar to René Nyffenegger's idea. Filter by each type of date field, and then amalgamate the counts.

Note, that it's not possible to do this query in one Select, because you need to both Group and Order By each date field, this is impossible without splitting into separate sub-queries.

I have coded a date range of '2009-11-04' to '2009-11-04 23:59:59' for this example:

SELECT
    DateHour,
    SUM(sampled) total_sampled,
    SUM(received) total_received,
    SUM(completed) total_completed,
    SUM(authorized) total_authorized
FROM
    (SELECT
        TO_CHAR(CREATED_DATE, 'YYYY-MM-DD HH24') DateHour,
        1 sampled,
        0 received,
        0 completed,
        0 authorized
    FROM
        SAMPLE
    WHERE
        sampled_on >= TO_DATE('2009-11-04', 'YYYY-MM-DD')
        AND sampled_on <= TO_DATE('2009-11-04 23:59:59', 'YYYY-MM-DD HH24:MI:SS')
    UNION ALL
    SELECT
        TO_CHAR(CREATED_DATE, 'YYYY-MM-DD HH24') DateHour,
        0 sampled,
        1 received,
        0 completed,
        0 authorized
    FROM
        SAMPLE
    WHERE
        received_on >= TO_DATE('2009-11-04', 'YYYY-MM-DD')
        AND received_on <= TO_DATE('2009-11-04 23:59:59', 'YYYY-MM-DD HH24:MI:SS')
    UNION ALL
    SELECT
        TO_CHAR(CREATED_DATE, 'YYYY-MM-DD HH24') DateHour,
        0 sampled,
        0 received,
        1 completed,
        0 authorized
    FROM
        SAMPLE
    WHERE
        completed_on >= TO_DATE('2009-11-04', 'YYYY-MM-DD')
        AND completed_on <= TO_DATE('2009-11-04 23:59:59', 'YYYY-MM-DD HH24:MI:SS')
    UNION ALL
    SELECT
        TO_CHAR(CREATED_DATE, 'YYYY-MM-DD HH24') DateHour,
        0 sampled,
        0 received,
        0 completed,
        1 authorized
    FROM
        SAMPLE
    WHERE
        authorized_on >= TO_DATE('2009-11-04', 'YYYY-MM-DD')
        AND authorized_on <= TO_DATE('2009-11-04 23:59:59', 'YYYY-MM-DD HH24:MI:SS'))
GROUP BY
    DateHour
ORDER BY
    DateHour
Jonathan Swift
+2  A: 

Try:

CREATE OR REPLACE VIEW my_view AS
WITH date_bookends AS (
  SELECT LEAST(MIN(t.sampled_on), MIN(t.received_on), MIN(t.completed_on), MIN(t.authorized_on)) 'min_date'
         GREATEST(MAX(t.sampled_on), MAX(t.received_on), MAX(t.completed_on), MAX(t.authorized_on)) 'max_date'
    FROM SAMPLE t),
    all_hours AS (
  SELECT t.min_date + numtodsinterval(LEVEL - 1,'hour') date_by_hour
    FROM date_bookends t
CONNECT BY LEVEL <= ( t.max_date - t.min_date + 1) * 24)
SELECT h.date_by_hour,
       COUNT(CASE WHEN h.hour = TRUNC(s.sampled_on,'hh24') THEN 1 END) sampled#
       COUNT(CASE WHEN h.hour = TRUNC(s.received_on,'hh24') THEN 1 END) received#
       COUNT(CASE WHEN h.hour = TRUNC(s.completed_on,'hh24') THEN 1 END) completed#
       COUNT(CASE WHEN h.hour = TRUNC(s.authorized_on,'hh24') THEN 1 END) authorized#
  FROM all_hours h
CROSS JOIN sample s
  GROUP BY h.hour

Without using Subquery Factoring:

CREATE OR REPLACE VIEW my_view AS
SELECT h.date_by_hour,
       COUNT(CASE WHEN h.hour = TRUNC(s.sampled_on,'hh24') THEN 1 END) sampled#
       COUNT(CASE WHEN h.hour = TRUNC(s.received_on,'hh24') THEN 1 END) received#
       COUNT(CASE WHEN h.hour = TRUNC(s.completed_on,'hh24') THEN 1 END) completed#
       COUNT(CASE WHEN h.hour = TRUNC(s.authorized_on,'hh24') THEN 1 END) authorized#
  FROM (SELECT t.min_date + numtodsinterval(LEVEL - 1,'hour') date_by_hour
         FROM (SELECT LEAST(MIN(t.sampled_on), MIN(t.received_on), MIN(t.completed_on), MIN(t.authorized_on)) 'min_date'
                            GREATEST(MAX(t.sampled_on), MAX(t.received_on), MAX(t.completed_on), MAX(t.authorized_on)) 'max_date'
                       FROM SAMPLE t) t
CONNECT BY LEVEL <= ( t.max_date - t.min_date + 1) * 24) h
CROSS JOIN sample s
GROUP BY h.hour

The query accesses the SAMPLES table twice - the first time to get the earliest & latest date to frame the construction of the date_by_hour value.

OMG Ponies
Thanks! Actually, I used your max-min idea without needing a second SAMPLE iteration. I used a literal (earlier than any sampled_on date) for `min_date` and SYSDATE+1 as max_date.
Steven
That was my only concern about hardcoding the dates; given the context, SYSDATE makes a lot of sense. Wish I'd thought of it actually...
OMG Ponies