views:

898

answers:

2

The following query:

SELECT
year, id, rate
FROM h
WHERE year BETWEEN 2000 AND 2009
AND id IN (SELECT rid FROM table2)
GROUP BY id, year
ORDER BY id, rate DESC

yields:

year    id  rate
2006    p01 8
2003    p01 7.4
2008    p01 6.8
2001    p01 5.9
2007    p01 5.3
2009    p01 4.4
2002    p01 3.9
2004    p01 3.5
2005    p01 2.1
2000    p01 0.8
2001    p02 12.5
2004    p02 12.4
2002    p02 12.2
2003    p02 10.3
2000    p02 8.7
2006    p02 4.6
2007    p02 3.3

What I'd like is only the top 5 results for each id (so you'd get 2006, 203, 2008, 2001, 2007 for p01 and 2001, 2004, 2002, 2003 for p02). Is there a way to do this using some kind of LIMIT like modifier that works w/in the GROUP BY?

Thanks!

+2  A: 

This can be done in MySQL, but it is not as simple as adding a LIMIT clause. Here is a link that explains the problem in detail: http://www.xaprb.com/blog/2006/12/07/how-to-select-the-firstleastmax-row-per-group-in-sql/

It's a good article - he introduces an elegant but naive solution to the "Top N per group" problem, and the gradually improves on it.

danben
A: 

No, you can't LIMIT subqueries arbitrarily (you can do it to a limited extent in newer MySQLs, but not for 5 results per group).

This is a groupwise-maximum type query, which is not trivial to do in SQL. There are various ways to tackle that which can be more efficient for some cases, but for top-n in general you'll want to look at Bill's answer to a similar previous question.

As with most solutions to this problem, it can return more than five rows if there are multiple rows with the same rate value, so you may still need a quantity of post-processing to check for that.

bobince