views:

251

answers:

3

I've seen this question asked a couple of times, and I've written my own query, but it's quite slow, and I would be extremely grateful if someone could offer advice on how to speed it up.

In a simplified scenario, I have the following two tables:

Group
- GroupID (primary key)

Member
- MemberID (primary key)
- GroupID (foreign key)

Let's say, for each GroupID in Group, I want to find the top 2 MemberID values from Member that have that GroupID.

Here's my current query that works, but is painfully slow:

SELECT M.MemberID, M.GroupID
FROM   Member AS M
WHERE  M.MemberID in 
        (Select top 2 Member.MemberID
         FROM Member
         Where Member.GroupID = M.GroupID
         ORDER BY Member.MemberID)

Say Group has the following rows
GroupID
1
2
3

and Member has the following rows
MemberID, GroupID
1, 1
2, 2
3, 3
4, 1
5, 2
6, 3
7, 1
8, 2
9, 3

Then my query should return:
MemberID GroupID
1, 1
2, 2
3, 3
4, 1
5, 2
6, 3

+2  A: 

You could probably use the RANK function to do this, but it might not be any faster. That's because you don't know why your query is slow.

Why not go find out? Look at the execution plan. See if there are table scans going on? Run the Query Optimizer and see what it has to say.

There's no reason to optimize until you know what's wrong.

John Saunders
+4  A: 

I believe the dependent nested query might be really hard for the db engine to optimize well (though @John Saunders' request to see the execution plan is well founded, and seeing what indices you have would not hurt either;-).

But, a more natural approach to such ranking related problems in SQL Server 2005 and 2008 (and other SQL engines, since the feature is in recent ANSI standards) is ranking functions -- RANK, DENSE_RANK, or ROW_NUMBER... they're all equivalent when you're ranking by a unique field, anyway;-). Even apart from optimization, they're easier to read once you're used to them (and more powerful when your problems are harder than this one), especially with the help of that other neat new-ish construct, the WITH clause...:

WITH OrderedMembers AS
(
    SELECT MemberId, GroupId,
    ROW_NUMBER() OVER (PARTITION BY GroupId ORDER BY MemberId) AS RowNumber
    FROM Member 
) 
SELECT MemberId, GroupId
FROM OrderedMembers 
WHERE RowNumber <= 2
ORDER BY MemberId;
Alex Martelli
A: 

Thank you John and Alex for your replies. I'm pretty fresh out of school and very new to SQLServer, so the execution plan option was brand new to me. It reported that 96% of the query cost was being consumed by a Clustered Index Scan which I'm assuming was a result of the nested query. Truth be told, I'm not quite sure what the next step would have been to optimize.

Alex, the query you provided ran in the blink of an eye on my dataset.

Thank you again gentlemen, I really appreciate your assistance.

csj
Tip: If Alex's answer solved your problem, I'd upvote his answer and accept it as the best answer.
sheepsimulator
I would like to do that, but I can't. I originally made this post from an anonymous account. Unfortunately, when I created a proper ID, I goofed and failed to merge the two. The original account is now inaccessible to me.
csj