views:

246

answers:

2

Hello,

I have a SQL table with a whole load of records that look like this:

| Date       | Score |
+ -----------+-------+
| 01/01/2010 |     4 |
| 02/01/2010 |     6 |
| 03/01/2010 |    10 |
  ...
| 16/03/2010 |     2 |

I'm plotting this on a chart, so I get a nice line across the graph indicating score-over-time. Lovely.

Now, what I need to do is include the average score on the chart, so we can see how that changes over time, so I can simply add this to the mix:

SELECT 
    YEAR(SCOREDATE) 'Year', MONTH(SCOREDATE) 'Month',
    MIN(SCORE) MinScore, 
    AVG(SCORE) AverageScore, 
    MAX(SCORE) MaxScore
FROM SCORES
GROUP BY YEAR(SCOREDATE), MONTH(SCOREDATE) 
ORDER BY YEAR(SCOREDATE), MONTH(SCOREDATE) 

That's no problem so far.

The problem is, how can I easily calculate the percentiles at each time-period? I'm not sure that's the correct phrase. What I need in total is:

  • A line on the chart for the score (easy)
  • A line on the chart for the average (easy)
  • A line on the chart showing the band that 95% of the scores occupy (stumped)

It's the third one that I don't get. I need to calculate the 5% percentile figures, which I can do singly:

SELECT MAX(SubQ.SCORE) FROM
    (SELECT TOP 45 PERCENT SCORE 
    FROM SCORES
    WHERE YEAR(SCOREDATE) = 2010 AND MONTH(SCOREDATE) = 1
    ORDER BY SCORE ASC) AS SubQ

SELECT MIN(SubQ.SCORE) FROM
    (SELECT TOP 45 PERCENT SCORE 
    FROM SCORES
    WHERE YEAR(SCOREDATE) = 2010 AND MONTH(SCOREDATE) = 1
    ORDER BY SCORE DESC) AS SubQ

But I can't work out how to get a table of all the months.

| Date       | Average | 45% | 55% |
+ -----------+---------+-----+-----+
| 01/01/2010 |      13 |  11 |  15 |
| 02/01/2010 |      10 |   8 |  12 |
| 03/01/2010 |       5 |   4 |  10 |
  ...
| 16/03/2010 |       7 |   7 |   9 |

At the moment I'm going to have to load this lot up into my app, and calculate the figures myself. Or run a larger number of individual queries and collate the results.

A: 

Without the data, I'm not sure if I'm doing this right, but maybe this will help get you there with two queries per year instead of 24...

SELECT MAX(SubQ.SCORE), MyMonth  FROM
    (SELECT TOP 45 PERCENT SCORE , MONTH(SCOREDATE) as MyMonth 
    FROM SCORES
    WHERE YEAR(SCOREDATE) = 2010
    ORDER BY SCORE ASC) AS SubQ
group by  MyMonth   
Nicholai
A: 

Whew. This was a real brain teaser. First, my table schema for testing was:

Create Table Scores 
( 
    Id int not null identity(1,1) primary key clustered
    , [Date] datetime not null
    , Score int not null
)

Now, first, I calculated the values using a CTE in SQL 2008 in order to check my answers and then I built a solution that should work in SQL 2000. So, in SQL 2008 we do something like:

;With 
    SummaryStatistics As
    (
        Select Year([Date]) As YearNum
            , Month([Date]) As MonthNum
            , Min(Score) As MinScore
            , Max(Score) As MaxScore
            , Avg(Score) As AvgScore
        From Scores
        Group By Month([Date]), Year([Date])
    )
    , Percentiles As
    (
        Select Year([Date]) As YearNum
            , Month([Date]) As MonthNum
            , Score
            , NTile( 100 ) Over ( Partition By Month([Date]), Year([Date]) Order By Score ) As Percentile
        From Scores
    )
    , ReportedPercentiles As
    (
        Select YearNum, MonthNum
            , Min(Case When Percentile = 45 Then Score End) As Percentile45
            , Min(Case When Percentile = 55 Then Score End) As Percentile55
        From Percentiles
        Where Percentile In(45,55)
        Group By YearNum, MonthNum
    )
Select SS.YearNum, SS.MonthNum
    , SS.MinScore, SS.MaxScore, SS.AvgScore
    , RP.Percentile45, RP.Percentile55
From SummaryStatistics As SS
    Join ReportedPercentiles As RP
        On  RP.YearNum = SS.YearNum
            And RP.MonthNum = SS.MonthNum
Order By SS.YearNum, SS.MonthNum

Now for a SQL 2000 solution. In essence, the trick is to use a couple of temporary tables to tally the occurances of the scores.

If object_id('tempdb..#Working') is not null
    DROP TABLE #Working
GO
Create Table #Working 
    (
    YearNum int not null
    , MonthNum int not null
    , Score int not null
    , Occurances int not null
    , Constraint PK_#Working Primary Key Clustered ( MonthNum, YearNum, Score )
    )
GO
Insert #Working(MonthNum, YearNum, Score, Occurances)
Select Month([Date]), Year([Date]), Score, Count(*)
From Scores
Group By Month([Date]), Year([Date]), Score
GO
If object_id('tempdb..#SummaryStatistics') is not null
    DROP TABLE #SummaryStatistics
GO
Create Table #SummaryStatistics
    (
    MonthNum int not null
    , YearNum int not null
    , Score int not null
    , Occurances int not null
    , CumulativeTotal int not null
    , Percentile float null
    , Constraint PK_#SummaryStatistics Primary Key Clustered ( MonthNum, YearNum, Score )
    )
GO
Insert #SummaryStatistics(YearNum, MonthNum, Score, Occurances, CumulativeTotal)
Select W2.YearNum, W2.MonthNum, W2.Score, W2.Occurances, Sum(W1.Occurances)-W2.Occurances
From #Working As W1
    Join #Working As W2 
        On W2.YearNum = W1.YearNum
            And W2.MonthNum = W1.MonthNum
Where W1.Score <= W2.Score
Group By W2.YearNum, W2.MonthNum, W2.Score, W2.Occurances

Update #SummaryStatistics
Set Percentile = SS.CumulativeTotal * 100.0 / MonthTotal.Total
From #SummaryStatistics As SS
    Join    (
            Select SS1.YearNum, SS1.MonthNum, Max(SS1.CumulativeTotal) As Total
            From #SummaryStatistics As SS1
            Group By SS1.YearNum, SS1.MonthNum
            ) As MonthTotal
        On MonthTotal.YearNum = SS.YearNum
            And MonthTotal.MonthNum = SS.MonthNum

Select GeneralStats.*, Percentiles.Percentile45, Percentiles.Percentile55
From    (
        Select  Year(S1.[Date]) As YearNum
            , Month(S1.[Date]) As MonthNum
            , Min(S1.Score) As MinScore
            , Max(S1.Score) As MaxScore
            , Avg(S1.Score) As AvgScore
        From Scores As S1
        Group By Month(S1.[Date]), Year(S1.[Date])
        ) As GeneralStats
    Join    (
            Select SS1.YearNum, SS1.MonthNum
                , Min(Case When SS1.Percentile >= 45 Then Score End) As Percentile45
                , Min(Case When SS1.Percentile >= 55 Then Score End) As Percentile55
            From #SummaryStatistics As SS1
            Group By SS1.YearNum, SS1.MonthNum 
            ) As Percentiles
        On Percentiles.YearNum = GeneralStats.YearNum
            And Percentiles.MonthNum = GeneralStats.MonthNum
Thomas
Wow, that's quite a set of operations :)Seems to work though, so that's great, thanks!
Cylindric