views:

1514

answers:

5

Lets say I have a table containing several hundred million rows that looks something like this:

memID | foo  | bar  | foobar
1     | blah | blah | blah
1     | blah | blah | blah
1     | blah | blah | blah
1     | blah | blah | blah
1     | blah | blah | blah
1     | blah | blah | blah
1     | blah | blah | blah
2     | blah | blah | blah
2     | blah | blah | blah
2     | blah | blah | blah
2     | blah | blah | blah
2     | blah | blah | blah
.
.
.
10001 | blah | blah | blah
10001 | blah | blah | blah

I need a query that will return the top N rows for each memID in a range of member IDs. For example, if N = 3 and the range is 0-2 it should return

memID | foo  | bar  | foobar
1     | blah | blah | blah
1     | blah | blah | blah
1     | blah | blah | blah
2     | blah | blah | blah
2     | blah | blah | blah
2     | blah | blah | blah

I've considered a couple approaches, first creating a massive

SELECT TOP (3) *
FROM table
WHERE memID = 0
UNION ALL
SELECT TOP (3) *
FROM table
WHERE memID = 1
.
.
.

query in code. This isn't really a realistic option for probably obvious reasons.

The second approach is to create a temporary table and loop over the range of memIDs inserting the TOP 3 for each memID into that temporary table.

WHILE @MemID < 10000 AND @MemID > 0
  BEGIN
    INSERT INTO tmp_Table
    SELECT TOP (3) *
     FROM table
     WHERE memID = @MemID

    SET @MemID = @MemID + 1
    END

This works, but I'm wondering if there is a more elegant single query solution that I'm missing.

Cadaeic gave me an answer that works without tinkering, but thank you to everyone that suggested analytics, it looks like I have some serious reading to do.

+1  A: 

If you are using SQL Server 2005 or 2008 you might want to investigate the Ranking Functions

cmsjr
+1  A: 

Use analytics. I didn't test this but should be close:

SELECT memID, foo, bar, foobar 
FROM  (
       SELECT memID, foo, bar, foobar, 
              RANK() OVER (PARTITION BY memID ORDER BY memID) AS 'nRank'
       FROM   table
       WHERE  memID BETWEEN 0 AND 2)
WHERE  nRank <= 3
Barry
A: 
SELECT * FROM Member m
Join ( Select TOP(3) * From Table Order By Table.Id) as t 
    On t.MemberId = m.MemberId
Where m BETWEEN 0 and 10000

should do the trick

Nicholas Mancuso
+2  A: 
SQL> select ename,sal,
  2   row_number()
  3     over (order by sal desc)rn,
  4   rank()
  5     over (order by sal desc)rnk,
  6   dense_rank()
  7     over (order by sal desc)drnk
  8   from emp
  9  order by sal desc
 10  /

ENAME    SAL   RN   RNK   DRNK
-----   ----   --   ---   ----
 KING   5000    1     1      1
 FORD   3000    2     2      2
SCOTT   3000    3     2      2
JONES   2975    4     4      3
BLAKE   2850    5     5      4
CLARK   2450    6     6      5
Brian
+5  A: 
declare @startID int, @endID int, @rowsEach int
select @startID = 0, @endID = 2, @rowsEach = 3


select *
from
(
    select memID, foo, bar, foobar, row_number() over (partition by dense_rank order by dense_rank) [rank_row]
    from
    (
     select memID, foo, bar, foobar, dense_rank() over (order by memID) [dense_rank]
     from #test
     where memID between @startID and @endID
    ) a
) b
where rank_row <= @rowsEach

The result:

memID       foo  bar  foobar rank_row
----------- ---- ---- ------ --------------------
1           blah blah blah   1
1           blah blah blah   2
1           blah blah blah   3
2           blah blah blah   1
2           blah blah blah   2
2           blah blah blah   3

And here's the set-up code if you'd like to test locally:

create table #test
(
      memID  int not null
    , foo  char(4) not null
    , bar  char(4) not null
    , foobar char(4) not null
)

insert into #test (memID, foo, bar, foobar)
select 1, 'blah', 'blah', 'blah'
union all
select 1, 'blah', 'blah', 'blah'
union all
select 1, 'blah', 'blah', 'blah'
union all
select 1, 'blah', 'blah', 'blah'
union all
select 1, 'blah', 'blah', 'blah'
union all
select 1, 'blah', 'blah', 'blah'
union all
select 1, 'blah', 'blah', 'blah'
union all
select 2, 'blah', 'blah', 'blah'
union all
select 2, 'blah', 'blah', 'blah'
union all
select 2, 'blah', 'blah', 'blah'
union all
select 2, 'blah', 'blah', 'blah'
union all
select 10001, 'blah', 'blah', 'blah'
union all
select 10001, 'blah', 'blah', 'blah'
union all
select 10001, 'blah', 'blah', 'blah'
Cadaeic
This works. Thank you to everyone who suggested analytics, I have a lot of reading to do!
blogsdon
Why preserve the select clause in each inner-query? Why not perform your outer select using the IN statement, and inside your IN statement you run the inner queries returning only primary key?You could also do similar with a join on primary key...
thesmart
@smartj blogsdon has indicated there is no PK on this table
Cadaeic