tags:

views:

294

answers:

1

I have some forum data of the form

post(author, thread_id, text)

For each author, I would like to select 10 distinct thread_ids associated with that author (there may be more than 10, and the number will vary by author).

I'm thinking of using GROUP BY to group on 'author', but I cannot understand how to express the LIMIT on each group, and how to expand each group back into 10 rows.

+1  A: 

Here's a solution to "top N per group" type queries.

Note that you have to choose which 10 threads for a given author you want. For this example, I'm assuming you want the most recent threads (and thread_id is an auto-increment value), and for cases of ties, you have a primary key posts.post_id.

SELECT p1.*
FROM post p1 LEFT OUTER JOIN post p2
 ON (p1.author = p2.author AND (p1.thread_id < p2.thread_id 
   OR p1.thread_id = p2.thread_id AND p1.post_id < p2.post_id))
GROUP BY p1.author
HAVING COUNT(*) < 10;


Re your follow-up question in the comment, here's the explanation:

In the top 10 threads per author, we can say that for each of these, there are 9 or fewer other threads for that author belonging to the result set. So for each author's post (p1), we count how many posts (p2) from the same author have a greater thread. If that count is less than 10, then that author's post (p1) belongs in the result.

I added a term to resolve ties with the post_id.

Bill Karwin
Thanks, that's useful. I'm not fully understanding the thought process though, would you be so kind as to elaborate on how it works?
saffsd