views:

97

answers:

2

Question

  • How do I write a T-SQL Stored Procedure that lets me select percentages of rows between X% and Y%?
  • So basically I would want to select the rows between 30 PERCENT and 40 PERCENT.....

I know that you can do the following, but obviously that doesn't let met specify a set of rows between 2 percentages.

SELECT TOP 50 PERCENT * FROM tblAssets 

Help greatly appreciated.

+11  A: 

Updated Answer

declare @NumRecords int
SELECT @NumRecords = COUNT(*) FROM tblAssets;

With Vals As
(
SELECT tblAssets.AssetId ...
, ROW_NUMBER() OVER ( order by tblAssets.AssetId) as RN
  FROM tblAssets
)

SELECT  tblAssets.AssetId ...
FROM vals 
Where RN between 0.3*@NumRecords and 0.4*@NumRecords

I've updated my answer as there were 2 problems with my original answer below

  1. Performance - It was beaten by the nested TOP solution
  2. Accuracy - There is an unexpected aspect of NTILE that I was not aware of

If the number of rows in a partition is not divisible by integer_expression, this will cause groups of two sizes that differ by one member. Larger groups come before smaller groups in the order specified by the OVER clause. For example if the total number of rows is 53 and the number of groups is five, the first three groups will have 11 rows and the two remaining groups will have 10 rows each.

I got the following values comparing with the nested TOP solution.

SET STATISTICS IO ON
SET STATISTICS TIME ON;

DECLARE @NumRecords int
SELECT @NumRecords = COUNT(*) FROM [master].[dbo].[spt_values];

WITH Vals As
(
SELECT  [number]
, ROW_NUMBER() OVER ( order by [number]) as RN
  FROM [master].[dbo].[spt_values]
)

SELECT [number] FROM vals Where RN
 BETWEEN 0.30*@NumRecords AND 0.40*@NumRecords

Gives

Table 'spt_values'. Scan count 1, logical reads 8, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'spt_values'. Scan count 1, logical reads 5, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SELECT TOP 25 PERCENT [number] FROM
(
SELECT TOP 40 PERCENT  [number]
FROM  [master].[dbo].[spt_values]
ORDER BY [number]  ASC
) TOP40
ORDER BY [number] DESC

Gives

Table 'Worktable'. Scan count 1, logical reads 4726, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'spt_values'. Scan count 1, logical reads 8, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Original Answer

With Vals As
(
SELECT tblAssets.AssetId ...
, NTILE (100)  OVER ( order by tblAssets.AssetId) as Pct
  FROM tblAssets 
)

SELECT * FROM vals Where Pct between 30 and 40
Martin Smith
+1: Nice. I really need to learn all of the new OVER functions in SQL Server. I was thinking TOP 25 PERCENT in DESC order after TOP 40 PERCENT. NTILE looks much cleaner.
Tom H.
Interesting! I was thinking the other day that NTILE is useless without the ability to define your own buckets...
OMG Ponies
will this work in sql server 2005?
Goober
@Goober - Yes. I just want to check the execution plan against Tom's suggested answer. I'll let you know the results.
Martin Smith
I'm not quite sure I understand the syntax....I'm fairly new to T-SQl. Why have you put "..." after "SELECT tblAssets.AssetId" - PRESUME it's not syntax
Goober
@Goober - Because I never use '*' so it was just an indication to replace the ... with your actual column list!
Martin Smith
A: 

I came up with this myself.......

SELECT TOP 40 *  
    INTO #TOP40
    FROM CCDtblAssets

 SELECT * FROM #TOP40
WHERE ASSETID NOT IN   
(SELECT TOP 30 ASSETID FROM #TOP40)

Although I do like Martins Answer.

Goober