tags:

views:

961

answers:

2

I have to work with an analysis tool that measures the Web Service calls to a server per hour. These measurments are inserted in a database. The following is a snippet of such a measurement:

mysql> SELECT * FROM sample s LIMIT 4;  
+---------+------+-------+  
| service | hour | calls |  
+---------+------+-------+  
| WS04    |   04 |    24 |  
| WS12    |   11 |    89 |  
| WSI64   |   03 |    35 |  
| WSX52   |   01 |    25 |  
+---------+------+-------+  
4 rows in set (0.00 sec)

As the end result I would like to know the sum of all web services completions per hour of day. Obviously, this can be easily done with SUM() and GROUP BY:

mysql> SELECT hour, SUM(calls) FROM sample s GROUP BY hour;  
+------+------------+  
| hour | SUM(calls) |  
+------+------------+  
|   00 |        634 |  
|   01 |        642 |  
|   02 |        633 |  
|   03 |        624 |  
|   04 |        420 |  
|   05 |        479 |  
|   06 |        428 |  
|   07 |        424 |  
|   08 |        473 |  
|   09 |        434 |  
|   10 |        485 |  
|   11 |        567 |  
|   12 |        526 |  
|   13 |        513 |  
|   14 |        555 |  
|   15 |        679 |  
|   16 |        624 |  
|   17 |        796 |  
|   18 |        752 |  
|   19 |        843 |  
|   20 |        827 |  
|   21 |        774 |  
|   22 |        647 |  
|   23 |        533 |  
+------+------------+  
12 rows in set (0.00 sec)

My problem is that in old sets, the web service calls in the hours from [00-11] were already summed up. The simple statement as listed above would therefore lead to

mysql> SELECT hour, SUM(calls) FROM sample s GROUP BY hour;  
+------+------------+  
| hour | SUM(calls) |  
+------+------------+  
|   00 |       6243 | <------ sum of hours 00-11!  
|   12 |        526 |  
|   13 |        513 |  
|   14 |        555 |  
|   15 |        679 |  
|   16 |        624 |  
|   17 |        796 |  
|   18 |        752 |  
|   19 |        843 |  
|   20 |        827 |  
|   21 |        774 |  
|   22 |        647 |  
|   23 |        533 |  
+------+------------+  
13 rows in set (0.00 sec)

This is an undesirable result. To make the old sets [00,12,...,23] comparable to the new sets [00,01,...,23] I would like to have one statement that averages the value of [00] and distributes it over the missing hours, e.g.:

+------+------------+  
| hour | SUM(calls) |  
+------+------------+  
|   00 |    6243/11 |  
|   01 |    6243/11 |  
            [...]  
|   12 |        526 |  
            [...]  
|   23 |        533 |  
+------+------------+

I can easily do this using temporary tables or views, but i don't know how to accomplish this without them.

Any ideas? Cause this is driving me crazy :P

+1  A: 

You're probably best doing this with temp tables / views (I'd recommend a view over a temp table) or you will end up with a nasty case specific statement that will be a nightmare to manage over time.

ck
Can you actually create views over temporary tables? I agree that it is likely that temp tables is the cleanest way to go.
Jonathan Leffler
I mean a view INSTEAD OF a temp table... A view OF a temp table would be fairly pointless, you might as well create the temp table right the first time.
ck
A: 

You'll need a rowset with 12 rows in it to make a join.

The most simple solution will be combining 12 SELECT statements in a union:

SELECT  COALESCE(morning.hour, sample.hour), 
        SUM(CASE WHEN morning.hour IS NULL THEN calls ELSE calls / 12 END) AS calls
FROM    sample
LEFT JOIN
        (
        SELECT 0 AS hour
        UNION ALL
        SELECT 1
        ...
        UNION ALL
        SELECT 11
        ) AS morning
ON      sample.hour = 0 AND sample.service IN ('old_service1', 'old_service2')
GROUP BY
        1
Quassnoi
Some of the data has actual values for hours 0, 1, .. 11, so you have to do the distribution when there is no data for the missing hours.
Jonathan Leffler
Updated to handle this.
Quassnoi