views:

424

answers:

4

Say, I have a table in the database (SQL Server 2008) with data similar to this (but much, much bigger):

| ID | SCORE | GROUP |
-----------------------
| 10 |     1 | A     |
| 6  |     2 | A     |
| 3  |     3 | A     |
|----|-------|-------|
| 8  |     5 | B     |
|----|-------|-------|
| 4  |     1 | C     |
| 9  |     3 | C     |
| 2  |     4 | C     |
| 7  |     4 | C     |
|----|-------|-------|
| 12 |     3 | D     |
| 1  |     3 | D     |
| 11 |     4 | D     |
| 5  |     6 | D     |

I'd like to get the ID of the top and bottom records for each GROUP, where the records for each group are ordered by SCORE (and supplementarily, ID), like this:

| GROUP | MIN_ID | MAX_ID  |
----------------------------
| A     | 10     | 3       |
| B     | 8      | 8       |
| C     | 4      | 7       |
| D     | 1      | 5       |

The question is: how can I achieve this?

So far, I have been attempting solutions based on the RANK() function, but haven't managed find a query which both produces the correct output and is vaguely efficient or maintainable.


Notes:

The example is simplified. My 'table' is actually the output of an already complex query, which I'm looking to add the final stages to. I would prefer to only select from the table once.

If possible, it would be good to have a general solution which would allow me to select the top and bottom n values per group.

The IDs are not in convenient order.

+2  A: 
DECLARE @YourTable TABLE (ID INTEGER, Score INTEGER, [Group] VARCHAR(1))
INSERT INTO @YourTable VALUES (10, 1, 'A')
INSERT INTO @YourTable VALUES (6 , 2, 'A')
INSERT INTO @YourTable VALUES (3 , 3, 'A')
INSERT INTO @YourTable VALUES (8 , 5, 'B')
INSERT INTO @YourTable VALUES (4 , 1, 'C')
INSERT INTO @YourTable VALUES (9 , 3, 'C')
INSERT INTO @YourTable VALUES (2 , 4, 'C')
INSERT INTO @YourTable VALUES (7 , 4, 'C')
INSERT INTO @YourTable VALUES (12, 3, 'D')
INSERT INTO @YourTable VALUES (1 , 3, 'D')
INSERT INTO @YourTable VALUES (11, 4, 'D')
INSERT INTO @YourTable VALUES (5 , 6, 'D')


SELECT [Group], MIN([Min_ID]), MAX([Max_ID])
FROM (
  SELECT [score].[Group], [Min_ID] = [min].ID, [Max_ID] = [max].ID
  FROM (
    SELECT [Group], [Min_Score] = MIN(Score), [Max_Score] = MAX(Score)
    FROM @YourTable
    GROUP BY [GROUP]) score
    INNER JOIN @YourTable [min] ON [min].[Group] = [score].[Group] AND [min].[Score] = [score].[Min_Score]
    INNER JOIN @YourTable [max] ON [max].[Group] = [score].[Group] AND [max].[Score] = [score].[Max_Score] 
  ) yourtable
GROUP BY [yourtable].[Group]
Lieven
+1  A: 

Any solution will also require some good indexes on group and score, but include ID

SELECT
    foo.[Group],
    m1.ID AS Min_ID,
    m2.ID AS Max_ID
FROM
    (
    SELECT
       [Group], MIN(Score) AS MinScore, MAX(Score) AS MaxScore
    FROM
       mytable
    GROUP BY
       [Group]
    ) foo
    JOIN
    mytable m1 ON foo.[Group] = m1.[Group] AND foo.MinScore = m1.Score
    JOIN
    mytable m2 ON foo.[Group] = m2.[Group] AND foo.MaxScore = m2.Score

In your sample data however, this also works because ID and score are aligned in order:

SELECT
    [Group],
    MIN(ID) AS Min_ID,
    MAX(ID) AS Max_ID
FROM
    mytable
GROUP BY
    [Group]
gbn
+1. After reading your solution, I think I finally understand what the OP wants. With the values he presented in his example, my solution gives the same result as your solution.
Lieven
Thanks. Yep, he added a comment though to clarify.
gbn
I think I have TDD-litis... Create the simplest thing that could possibly work.
Lieven
+2  A: 

If possible, it would be good to have a general solution which would allow me to select the top and bottom n values per group.

WITH q AS
        (
        SELECT  m.*,
                ROW_NUMBER() OVER (PARTITION BY Group ORDER BY Score) AS rn_asc,
                ROW_NUMBER() OVER (PARTITION BY Group ORDER BY Score DESC) AS rn_desc
        FROM    mytable m
        )
SELECT  *
FROM    q
WHERE   rn_asc BETWEEN 1 AND 10
        OR rn_desc BETWEEN 1 AND 10
Quassnoi
A: 

You could use a subquery (SELECT TOP 1...)

Nick S.