views:

23

answers:

1

I'm looking to create a histogram in SQL (which in itself isn't too tricky), but what I'm looking for is a way of splitting the bins so that each bin / band has the same proportion of the data included within.

For example if I have the sample data (the value column) and I want to divide it into 5 bins, I know that I can work out the number of bins by doing something like

(MAX(Value) - MIN(Value)) / numberofsteps

Will give the groups we see in the band 1 column.

However what I want is for the bands to be calculated so that each band accounts for (100 / n) % of the total where n is the number of bands (so in this case each of the 5 bands would represent 20% of the total data) - which is what is shown in the band 2 column

Value      band 1     band 2
1     | 1 to 2    | 0 to 1
1     | 1 to 2    | 0 to 1
1     | 1 to 2    | 0 to 1
1     | 1 to 2    | 0 to 1
2     | 1 to 2    | 2 to 3
2     | 1 to 2    | 2 to 3
3     | 1 to 2    | 2 to 3
3     | 1 to 2    | 2 to 3
4     | 3 to 4    | 4 to 6
4     | 3 to 4    | 4 to 6
5     | 5 to 6    | 4 to 6
6     | 5 to 6    | 4 to 6
7     | 7 to 8    | 7 to 8
8     | 7 to 8    | 7 to 8
8     | 7 to 8    | 7 to 8
8     | 7 to 8    | 7 to 8
9     | 9 to 10   | 9 to 10
10    | 9 to 10   | 9 to 10
10    | 9 to 10   | 9 to 10
10  |    9 to 10   | 9 to 10

Is there a way to do this in SQL (i'm using SQL server 2005 if that helps), possibly without creating a UDF and having it so that I can easily alter the number of bins would be great (if that's not asking the impossible!)

Thanks

+2  A: 

To divide into bins you can use the ntile function.

with Vals AS
(
SELECT 1 AS value UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 8 UNION ALL SELECT 8 UNION ALL SELECT 9 UNION ALL SELECT 10 UNION ALL SELECT 10 UNION ALL SELECT 10
), TiledVals AS
(
SELECT value, NTILE(5) OVER (ORDER BY value) AS BinNumber
FROM Vals
)
SELECT value, BinNumber, 
Min(value) OVER (PARTITION BY BinNumber) As StartBin,
 MAX(value) OVER (PARTITION BY BinNumber) As EndBin
FROM TiledVals

Gives

value       BinNumber            StartBin    EndBin
----------- -------------------- ----------- -----------
1           1                    1           1
1           1                    1           1
1           1                    1           1
1           1                    1           1
2           2                    2           3
2           2                    2           3
3           2                    2           3
3           2                    2           3
4           3                    4           6
4           3                    4           6
5           3                    4           6
6           3                    4           6
7           4                    7           8
8           4                    7           8
8           4                    7           8
8           4                    7           8
9           5                    9           10
10          5                    9           10
10          5                    9           10
10          5                    9           10
Martin Smith
Thanks (speedy reply too!) - is there any way of making it so that for each value the band 2 column displays the lower and upper limit of that band as well (in a way that will dynamically change when the band's upper and lower limit are altered)
Davin
@Davin - See edit.
Martin Smith