tags:

views:

405

answers:

5

I struggling with a problem I have in TSQL, I need to get the top 10 results for each user from a table that might contain more than 10 results.

My natural (and procedurally minded) approach is "for each user in table T select the top 10 results ordered by date".

Each time I try to formulate the question in my mind in a set based approach, I keep running into the term "foreach".

Is it possible to do something like this:

SELECT *
FROM table AS t1
INNER JOIN (
    SELECT TOP 10 *
    FROM table AS t2
    WHERE t2.id = t1.id
    ORDER BY date DESC
)

Or even

SELECT (    SELECT TOP 10 *
             FROM table AS t2
             WHERE t2.id = t1.id
             ORDER BY date    )
FROM table AS t1

Or is there another solution to this using temp tables that I should think about?

EDIT:

Just to be perfectly clear - I need to the top 10 results for each user in the table, e.g. 10 * N where N = number of users.

EDIT:

In response to a suggestion made by RBarryYoung, I'm having an issue, which is best demonstrated with code:

CREATE TABLE #temp (id INT, date DATETIME)

INSERT INTO #temp (id, date) VALUES (1, GETDATE())
INSERT INTO #temp (id, date) VALUES (1, GETDATE())

SELECT *
FROM #temp AS t1
CROSS APPLY (
 SELECT TOP 1 *
 FROM #temp AS t2
 WHERE t2.id = t1.id
 ORDER BY t2.date DESC
) AS t2

DROP TABLE #temp

Running this, you can see that this doesn't limit the results to the TOP 1... Am I doing something wrong here?

EDIT:

It seems my last example provided a bit of confusion. Here is an example showing what I want to do:

CREATE TABLE #temp (id INT, date DATETIME)
INSERT INTO #temp (id, date) VALUES (1, GETDATE())
INSERT INTO #temp (id, date) VALUES (1, GETDATE())
INSERT INTO #temp (id, date) VALUES (1, GETDATE())
INSERT INTO #temp (id, date) VALUES (2, GETDATE())

SELECT *
FROM #temp AS t1
CROSS APPLY
(
    SELECT TOP 2 *
 FROM #temp AS t2
    WHERE t2.id = t1.id
    ORDER BY t2.date DESC
) AS t2

DROP TABLE #temp

This outputs:

1 2009-08-26 09:05:56.570 1 2009-08-26 09:05:56.583
1 2009-08-26 09:05:56.570 1 2009-08-26 09:05:56.583
1 2009-08-26 09:05:56.583 1 2009-08-26 09:05:56.583
1 2009-08-26 09:05:56.583 1 2009-08-26 09:05:56.583
1 2009-08-26 09:05:56.583 1 2009-08-26 09:05:56.583
1 2009-08-26 09:05:56.583 1 2009-08-26 09:05:56.583
2 2009-08-26 09:05:56.583 2 2009-08-26 09:05:56.583

If I use distinct:

SELECT DISTINCT t1.id
FROM #temp AS t1
CROSS APPLY
(
    SELECT TOP 2 *
 FROM #temp AS t2
    WHERE t2.id = t1.id
    ORDER BY t2.date DESC
) AS t2

I get

1
2

I need

1
1
2

Does anyone know if this is possible?

EDIT:

The following code will do this

WITH RowTable AS
(
SELECT 
 id, date, ROW_NUMBER() OVER (PARTITION BY id ORDER BY date DESC) AS RowNum
FROM #temp 
)
SELECT *
FROM RowTable
WHERE RowNum <= 2;

I posted in the comments, but there is no code formatting, so it doesn't look very nice.

A: 

It is possible, however using nested queries will be slower.

The following will also find the results you are looking for:

SELECT TOP 10 * 
FROM table as t1
INNER JOIN table as t2 
  ON t1.id = t2.id
ORDER BY date DESC
Russell
That will only return 10 results, I need to select 10 for each user.
Khanzor
Ah ok, sorry about that! Your first query would work faster than the second one.
Russell
A: 

I believe this SO question will answer your question. It's not answering exactly the same question, but I think the solution will work for you too.

Jeff Leonard
Not quite - I want to just select the top 10 results or less, that answer appears to only select when the rank is under than number. It seems rather similiar to a HAVING COUNT(*) phrase?
Khanzor
+3  A: 

Yes, here are several differet good ways to do this in 2005 and 2008. The one most similar to what you are already trying is with CROSS APPLY:

SELECT T2.*
FROM (
    SELECT DISTINCT ID FROM table
) AS t1
CROSS APPLY (
    SELECT TOP 10 *
    FROM table AS t2
    WHERE t2.id = t1.id
    ORDER BY date DESC
) AS t2
ORDER BY T2.id, date DESC

This then returns the ten most recent entries in [table] (or as many as exist, up to 10), for each distinct [id]. Asumming that [id] corresponds to a user, then this should be exactly what you are asking for.

(edit: slight changes because I did not take into account that T1 and T2 were the same tables and thus there will be multiple duplicate t1.IDs matching multiple duplicate T2.ids.)

RBarryYoung
+1 Awesome suggestion! I've never seen CROSS APPLY before. However, that doesn't limit the results to 10? See latest edit for an example.
Khanzor
On top select * put select n*10 if you want to see for 3 users use select top 30 but in this case every user has to have at least 10 results...
THEn
@THEn - that's not quite right, it will just limit the total results, while still leaving some which have more than 10 in the queryset.
Khanzor
Khanzor: If I understand your concerns, then the (DISTINCT) subquery that I added should fix this (sorry, I cannot test from here, so am coding blind).
RBarryYoung
A: 

Here's a trick I use to do this "top-N-per-group" type of query:

SELECT t1.id
FROM table t1 LEFT OUTER JOIN table t2 
 ON (t1.user_id = t2.user_id AND (t1.date > t2.date
     OR t1.date = t2.date AND t1.id > t2.id))
GROUP BY t1.id
HAVING COUNT(*) < 10
ORDER BY t1.user_id, COALESCE(COUNT(*), 0);
Bill Karwin
This only returns the distinct T1.ids?
RBarryYoung
If you need more columns, you have to include them in the `GROUP BY` clause. Or else you could use the above as a subquery in an `IN()` predicate so you can get the rest of the user data.
Bill Karwin
A: 
select userid, foo, row_number() over (partition by userid order by foo)  as rownum from table where rownum <= 10
Well, for some reason, that doesn't work with the example table I gave, e.g.SELECT id, date, ROW_NUMBER() OVER (PARTITION BY id ORDER BY date DESC) AS RowNum FROM #temp WHERE RowNum <= 2;But if it's modified to use a WITH blah AS statement:WITH RowTable AS(SELECT id, date, ROW_NUMBER() OVER (PARTITION BY id ORDER BY date DESC) AS RowNumFROM #temp )SELECT *FROM RowTableWHERE RowNum <= 2;It works fine.
Khanzor