I'm looking to create a histogram in SQL (which in itself isn't too tricky), but what I'm looking for is a way of splitting the bins so that each bin / band has the same proportion of the data included within.
For example if I have the sample data (the value column) and I want to divide it into 5 bins, I know that I can work out the number of bins by doing something like
(MAX(Value) - MIN(Value)) / numberofsteps
Will give the groups we see in the band 1 column.
However what I want is for the bands to be calculated so that each band accounts for (100 / n) % of the total where n is the number of bands (so in this case each of the 5 bands would represent 20% of the total data) - which is what is shown in the band 2 column
Value band 1 band 2
1 | 1 to 2 | 0 to 1
1 | 1 to 2 | 0 to 1
1 | 1 to 2 | 0 to 1
1 | 1 to 2 | 0 to 1
2 | 1 to 2 | 2 to 3
2 | 1 to 2 | 2 to 3
3 | 1 to 2 | 2 to 3
3 | 1 to 2 | 2 to 3
4 | 3 to 4 | 4 to 6
4 | 3 to 4 | 4 to 6
5 | 5 to 6 | 4 to 6
6 | 5 to 6 | 4 to 6
7 | 7 to 8 | 7 to 8
8 | 7 to 8 | 7 to 8
8 | 7 to 8 | 7 to 8
8 | 7 to 8 | 7 to 8
9 | 9 to 10 | 9 to 10
10 | 9 to 10 | 9 to 10
10 | 9 to 10 | 9 to 10
10 | 9 to 10 | 9 to 10
Is there a way to do this in SQL (i'm using SQL server 2005 if that helps), possibly without creating a UDF and having it so that I can easily alter the number of bins would be great (if that's not asking the impossible!)
Thanks