views:

64

answers:

2

In MSSQL2008, I am trying to compute the median of a column of numbers from a common table expression using the classic median query as follows:

WITH cte AS
(
   SELECT number
   FROM table
) 

SELECT cte.*,
(SELECT 
  (SELECT (   
    (SELECT TOP 1 cte.number  
     FROM     
     (SELECT TOP 50 PERCENT cte.number     
      FROM cte
      ORDER BY cte.number) AS medianSubquery1   
    ORDER BY cte.number DESC)  
    +   
  (SELECT TOP 1 cte.number
   FROM     
    (SELECT TOP 50 PERCENT cte.number    
     FROM cte   
     ORDER BY cte.number DESC) AS medianSubquery2   
   ORDER BY cte.number ASC) ) / 2)) AS median

FROM cte
ORDER BY cte.number

The result set that I get is the following:

NUMBER    MEDIAN
x1        x1
x1        x1
x1        x1
x2        x2
x3        x3

In other words, the "median" column is the same as the "number" column when I would expect the median column to be "x1" all the way down. I use a similar expression to compute the mode and it works fine over the same common table expression.

+1  A: 

The problem with your query is that you are doing

SELECT TOP 1 cte.number FROM...

but it isn't correlated with the sub query it is correlated with the Outer query so the subquery is irrelevant. Which explains why you simply end up with the same value all the way down. Removing the cte. (as below) gives the median of the CTE. Which is a constant value. What are you trying to do?

WITH cte AS
    ( SELECT NUMBER
    FROM master.dbo.spt_values
    WHERE TYPE='p'
    )

SELECT cte.*,
(SELECT 
  (SELECT (   
    (SELECT TOP 1 number  
     FROM     
     (SELECT TOP 50 PERCENT cte.number     
      FROM cte
      ORDER BY cte.number) AS medianSubquery1   
    ORDER BY number DESC)  
    +   
  (SELECT TOP 1 number
   FROM     
    (SELECT TOP 50 PERCENT cte.number    
     FROM cte   
     ORDER BY cte.number DESC) AS medianSubquery2   
   ORDER BY number ASC) ) / 2)) AS median
FROM cte
ORDER BY cte.number

Returns

NUMBER      median
----------- -----------
0           1023
1           1023
2           1023
3           1023
4           1023
5           1023
6           1023
7           1023
Martin Smith
The only difference that I see with your query is the addition of a WHERE clause in the common table expression and I'm not sure what that WHERE clause is supposed to do.
Dan
Ah, I didn't understand that specifying the CTE name as a qualifier had that effect. In practice, I use the two columns for charting purposes so the constant median value is useful for charting a second series across the chart at the median of the first series.Thanks for your help.
Dan
@Martin Smith - Technically, the output is incorrect. 1023.50 should the median. Thus, shouldn't you need to convert Number to a decimal or float (e.g. Number * 1.000) in case you have an even number of rows and need to do the division?
Thomas
+1  A: 

Here's a slightly different way to do it:

WITH cte AS
(
   SELECT number
   FROM table1
)
SELECT T1.number, T3.median
FROM cte T1, 
(
    SELECT AVG(number) AS median
    FROM
    (
        SELECT number, ROW_NUMBER() OVER(ORDER BY number) AS rn
        FROM cte
    ) T2
    WHERE T2.rn = ((SELECT COUNT(*) FROM table1) + 1) / 2
    OR T2.rn = ((SELECT COUNT(*) FROM table1) + 2) / 2
) T3
Mark Byers
Thanks. This is certainly a cleaner and easier way of computing the median.
Dan