views:

104

answers:

4
+2  Q: 

Union and order by

Consider a table like

tbl_ranks
--------------------------------
family_id | item_id | view_count 
--------------------------------
1           10        101
1           11        112
1           13        109

2           21        101
2           22        112
2           23        109

3           30        101
3           31        112
3           33        109

4           40        101
4           51        112
4           63        109

5           80        101
5           81        112
5           88        109

I need to generate a result set with the top two(2) rows for a subset of family ids (say, 1,2,3 and 4) ordered by view count. I'd like to do something like

select top 2 * from tbl_ranks where family_id = 1 order by view_count
union all
select top 2 * from tbl_ranks where family_id = 2 order by view_count
union all
select top 2 * from tbl_ranks where family_id = 3 order by view_count
union all
select top 2 * from tbl_ranks where family_id = 4 order by view_count

but, of course, order by isn't valid in a union all context in this manner. Any suggestions? I know I could run a set of 4 queries, store the results into a temp table and select the contents of that temp as the final result, but I'd rather avoid using a temp table if possible.

Note: in the real app, the number of records per family id is indeterminate, and the view_counts are also not fixed as they appear in the above example.

+2  A: 

Use:

SELECT *
  FROM (select *,
               ROW_NUMBER() OVER (PARTITION BY family_id ORDER BY view_count DESC) 'rank'
          from tbl_ranks) x
  WHERE x.rank <= 2
ORDER BY ...

The rationale is to assign a ranking, and then filter based on it.

OMG Ponies
Should probably use a sub select to access the rank column.
astander
I don't believe that the rank column is available in the WHERE predicate. Analytics are evaluated *after* predicates, so you need a sub-select for this to work.
LBushkin
+1  A: 
SELECT  tro.*
FROM    family
CROSS APPLY
        (
        SELECT  TOP 2 *
        FROM    tbl_ranks tr
        WHERE   tr.family_id = family.id
        ORDER BY
                view_count DESC
        ) tro
WHERE   family.id IN (1, 2, 3, 4)

If you don't have an actual family table, you can construct it using a set of unions or a recursive CTE:

WITH   family AS
       (
       SELECT  1 AS id
       UNION ALL
       SELECT  2 AS id
       UNION ALL
       SELECT  3 AS id
       UNION ALL
       SELECT  4 AS id
       )
SELECT  tro.*
FROM    family
CROSS APPLY
        (
        SELECT  TOP 2 *
        FROM    tbl_ranks tr
        WHERE   tr.family_id = family.id
        ORDER BY
                view_count DESC
        ) tro
WHERE   family.id IN (1, 2, 3, 4)

Make sure you have an index on tbl_ranks (family_id, viewcount).

This will be efficient if you have lots of ranks per family, since analytic functions like ROW_NUMBER will not use the TOP method if used with PARTITION BY.

Quassnoi
+1  A: 

If you're using SQL Server 2005 or later, you can take advantage of analytic functions:

SELECT * FROM (
   SELECT rank() OVER (PARTITION BY family_id ORDER BY view_count) AS RNK, * FROM ...
     )
WHERE RNK <= 2
ORDER BY ...
LBushkin
+1  A: 

You can try something like this

DECLARE @tbl_ranks TABLE(
     family_id INT,
     item_id INT,
     view_count INT
)

INSERT INTO @tbl_ranks SELECT 1,10,101
INSERT INTO @tbl_ranks SELECT 1,11,112
INSERT INTO @tbl_ranks SELECT 1,13,109

INSERT INTO @tbl_ranks SELECT 2,21,101
INSERT INTO @tbl_ranks SELECT 2,22,112
INSERT INTO @tbl_ranks SELECT 2,23,109

INSERT INTO @tbl_ranks SELECT 3,30,101
INSERT INTO @tbl_ranks SELECT 3,31,112
INSERT INTO @tbl_ranks SELECT 3,33,109

INSERT INTO @tbl_ranks SELECT 4,40,101
INSERT INTO @tbl_ranks SELECT 4,51,112
INSERT INTO @tbl_ranks SELECT 4,63,109

INSERT INTO @tbl_ranks SELECT 5,80,101
INSERT INTO @tbl_ranks SELECT 5,81,112
INSERT INTO @tbl_ranks SELECT 5,88,109

SELECT  *
FROm    (
      SELECT *,
        ROW_NUMBER() OVER(PARTITION BY family_id ORDER BY view_count DESC) MyOrder
      FROM @tbl_ranks
     ) MyOrders
WHERE   MyOrder <= 2
astander
I modified this to use a CTE instead of the nested select, but otherwise it's perfect. Thanks!
David Lively