Hi everybody,
First of all, I am running on DB2 for i5/OS V5R4. I have ROW_NUMBER(), RANK() and common table expressions. I do not have TOP n PERCENT or LIMIT OFFSET.
The actual data set I'm working with is hard to explain, so let's just say I have a weather history table where the columns are (city, temperature, timestamp)
. I want to compare medians to averages for each group (city)
.
This was the cleanest way I found to get a median for a whole table aggregation. I adapted it from the IBM Redbook here:
WITH base_t AS
( SELECT temp, row_number() over (order by temperature) AS rownum FROM t ),
count_t AS
( SELECT COUNT(temperature) + 1 AS base_count FROM base_t ),
median_t AS
( SELECT temperature FROM base_t, count_t
WHERE rownum in (FLOOR(base_count/2e0), CEILING(base_count/2e0)) )
SELECT DECIMAL(AVG(temperature),10,2) AS median FROM median_t
That works well for getting a single row back, but it seems to fall apart for grouping. Conceptually, this is what I want:
SELECT city, AVG(temperature), MEDIAN(temperature) FROM ...
city | mean_temp | median_temp
===================================================
'Minneapolis' | 60 | 64
'Milwaukee' | 65 | 66
'Muskegon' | 70 | 61
There could be an answer that makes me look stupid, but I'm having a mental block and this isn't my #1 thing to work on right now. Seems like it could be possible, but I can't use something that's extremely complex since it's a large table and I want the ability to customize which columns are being aggregated.