views:

1107

answers:

1

Hi everybody,

First of all, I am running on DB2 for i5/OS V5R4. I have ROW_NUMBER(), RANK() and common table expressions. I do not have TOP n PERCENT or LIMIT OFFSET.

The actual data set I'm working with is hard to explain, so let's just say I have a weather history table where the columns are (city, temperature, timestamp). I want to compare medians to averages for each group (city).

This was the cleanest way I found to get a median for a whole table aggregation. I adapted it from the IBM Redbook here:

WITH base_t AS
( SELECT temp, row_number() over (order by temperature) AS rownum FROM t ),
count_t AS
( SELECT COUNT(temperature) + 1 AS base_count FROM base_t ),
median_t AS
( SELECT temperature FROM base_t, count_t
  WHERE rownum in (FLOOR(base_count/2e0), CEILING(base_count/2e0)) )
SELECT DECIMAL(AVG(temperature),10,2) AS median FROM median_t

That works well for getting a single row back, but it seems to fall apart for grouping. Conceptually, this is what I want:


SELECT city, AVG(temperature), MEDIAN(temperature) FROM ...

city           | mean_temp       | median_temp       
===================================================
'Minneapolis'  | 60              | 64
'Milwaukee'    | 65              | 66
'Muskegon'     | 70              | 61

There could be an answer that makes me look stupid, but I'm having a mental block and this isn't my #1 thing to work on right now. Seems like it could be possible, but I can't use something that's extremely complex since it's a large table and I want the ability to customize which columns are being aggregated.

A: 

In SQL Server, agreagate functions like count(*) can be partitioned and calculated without a group by. I looked quickly through the referenced redbook, and it looks like DB2 has the same feature. But if not, then this won't work:

create table TemperatureHistory 
    (City varchar(20)
    , Temperature decimal(5, 2)
    , DateTaken datetime)

insert into TemperatureHistory values ('Minneapolis', 61, '20090101')
insert into TemperatureHistory values ('Minneapolis', 59, '20090102')

insert into TemperatureHistory values ('Milwaukee', 65, '20090101')
insert into TemperatureHistory values ('Milwaukee', 65, '20090102')
insert into TemperatureHistory values ('Milwaukee', 100, '20090103')

insert into TemperatureHistory values ('Muskegon', 80, '20090101')
insert into TemperatureHistory values ('Muskegon', 70, '20090102')
insert into TemperatureHistory values ('Muskegon', 70, '20090103')
insert into TemperatureHistory values ('Muskegon', 20, '20090104')

; with base_t as
    (select city
     , Temperature
     , row_number() over (partition by city order by temperature) as RowNum
     , (count(*) over (partition by city)) + 1 as CountPlusOne 
    from TemperatureHistory)
select City
    , avg(Temperature) as MeanTemp
    , avg(case 
     when RowNum in (FLOOR(CountPlusOne/2.0), CEILING(CountPlusOne/2.0)) 
      then Temperature
      else null end) as MedianTemp
from base_t 
group by City
Shannon Severance
I will give this a try, I really need to get better at understanding OLAP. Thanks.
twblamer
I'm setting this as the accepted answer. It works with the count(*) in DB2 LUW 9.5, but unfortunately not in DB2 for i. Since no one else had any ideas, I guess I'm going to have to live with this limitation. Thanks again.
twblamer