views:

46

answers:

0

I need a T-SQL ranking approach similar to the one provided by NTILE(), except that the members of each tile would be on a sliding distribution so that higher ranking tiles have fewer members.

For example

CREATE TABLE #Rank_Table(
id int identity(1,1) not null,
hits bigint not null default 0,
PERCENTILE smallint null
)
--Slant the distribution of the data
INSERT INTO #Rank_Table (hits)
select CASE 
  when DATA > 9500 THEN DATA*30
  WHEN data > 8000  THEN DATA*5 
  WHEN data < 7000  THEN DATA/3 +1
  ELSE DATA
 END
FROM
 (select top 10000 (ABS(CHECKSUM(NewId())) % 99 +1) * (ABS(CHECKSUM(NewId())) % 99 +1 ) DATA
 from master..spt_values t1
  cross JOIN master..spt_values t2) exponential

Declare @hitsPerGroup as bigint
Declare @numGroups as smallint
set @numGroups=100

select @hitsPerGroup=SUM(hits)/(@numGroups -1) FROM #Rank_Table 

select @hitsPerGroup HITS_PER_GROUP

--This is an even distribution
SELECT  id,HITS, NTILE(@numGroups) Over (Order By HITS DESC) PERCENTILE 
FROM #Rank_Table 
GROUP by id, HITS

--This is my best attempt, but it skips groups because of the erratic distribution
select 
    T1.ID, 
    T1.hits, 
    T.RunningTotal/@hitsPerGroup + 1 TILE,
    T.RunningTotal
FROM    #Rank_Table T1
        CROSS APPLY ( Select SUM(hits) RunningTotal FROM #Rank_Table where hits <= T1.hits) T
order by T1.hits 

DROP TABLE #Rank_Table

In #Rank_table, NTILE(@numGroups) creates an even distribution of @numGroups groups. What I need are @numGroups groups where the Tile 100 has the fewest members, tile 2 would have one or more than tile 1, tile 3 would have 1 or more than tile 2 ... tile 100 would have the most.

I'm using SQL Server 2008. In practice this will be run against a permanent table with potentially millions of rows in order to periodically update the PERCENTILe column with its percentile from 1-100.

My best attempt above will skip percentiles and performs poorly. There must be a better way. My statistics are fuzzy, but it seems that some approach to make the hits more linear would do the trick?