views:

88

answers:

2

How would you write a MySQL query that will limit the results of a joined table (or sub select if that works better) and also counts the number of items in the joined table or tables?

For instance, let's say you had three tables: projects, tasks and comments, where a project has 0 or more tasks and a task has 0 or more comments. How would you limit the number of tasks returned per project to 3 and also return the total number of tasks per project and comments per task?

Here's what I imagine the result set look like:

project_id, project_title, task_id, task_title, num_tasks, num_comments
------------------------------------------------------------------------
1, Project1, 1, Task1, 4, 3
1, Project1, 2, Task2, 4, 0
1, Project1, 3, Task3, 4, 9
2, Project2, 10, Task10, 20, 0
2, Project2, 11, Task11, 20, 0
2, Project2, 12, Task12, 20, 2
3, Project3, 20, Task20, 17, 5
3, Project3, 21, Task21, 17, 1
3, Project3, 22, Task22, 17, 2

Where 'Project1', 'Project2', etc just represent a project's title and 'Task1', 'Task2', etc represent a task's title.

Ultimately, (after parsing through the results of the query) I'd like to be able to display something like this:

 Project1 (4 tasks)
     Task1 (3 comments)
     Task2 (0 comments)
     Task3 (9 comments)
 Project2 (20 tasks)
     Task10 (0 comments)
     Task11 (0 comments)
     Task12 (2 comments)
 Project3 (17 tasks)
     Task20 (5 comments)
     Task21 (1 comments)
     Task22 (2 comments)

I'm guessing this has to be done with sub selects (which is fine), but I can't seem to figure out how to accomplish this with just using joins and I don't quite have a good enough handle on sub selects to do something like this.

A: 

Honestly, I'd do this in multiple queries, to avoid the correlated subqueries.

But here you go:

SELECT p.project_id, p.project_title,
    t1.task_id, t1.task_title,
    (SELECT COUNT(*) FROM tasks t 
       WHERE t.project_id = p.project_id) AS num_tasks,
    COALESCE((SELECT COUNT(*) FROM comments c
       WHERE c.task_id = t1.task_id), 0) AS num_comments
FROM projects p
JOIN tasks t1 ON (p.project_id = t1.project_id)
LEFT OUTER JOIN tasks t2 
  ON (p.project_id = t2.project_id AND t1.task_id > t2.task_id)
GROUP BY t1.task_id
HAVING COUNT(*) < 3;

Consider that correlated subqueries like those above (num_tasks and num_comments) must execute many times -- once for each row of t1.

You can get the results by running these queries separately and combining the results in your application code:

SELECT p.project_id, p.project_title,
    t1.task_id, t1.task_title
FROM projects p
JOIN tasks t1 ON (p.project_id = t1.project_id)
LEFT OUTER JOIN tasks t2 
  ON (p.project_id = t2.project_id AND t1.task_id > t2.task_id)
GROUP BY t1.task_id
HAVING COUNT(*) < 3;

SELECT task_id, COUNT(*) AS num_comments
FROM comments
WHERE task_id IN (...list of task_id values from first query...)
GROUP BY task_id;

SELECT project_id, COUNT(*) AS num_tasks
FROM tasks
GROUP BY project_id;

Even running three separate queries like this might be faster overall than running the more complex query that gets all the results together. I say might because it depends on how much data we're talking about. To be sure, you'd have to test both solutions using your own database.


Re your followup question, I'd do this in a subquery:

SELECT p.project_id, p.project_title,
    t1.task_id, t1.task_title
FROM (SELECT * FROM projects ORDER BY last_updated DESC LIMIT 5) p
. . .

Note this is not a correlated subquery; the RDBMS only has to do the subquery once.

I used DESC because I assume you want the most recent projects.

Bill Karwin
So, in this case it would be more efficient to split the query into multiple queries and then deal with the separate result sets in the scripting language (PHP, etc)? Are correlated subqueries something that should generally be avoided?
Jason Roberts
That makes a lot of sense, but I have one more question, which I meant to include in my original question. In the first query (of the set of three queries), how would you limit the total number of projects returned. In it's current form, it returns up to three tasks for all projects, but let's say I wanted to limit it to no more than 5, ordered by the project field last_updated? By the way, thanks so much for all of your help.
Jason Roberts
That's actually the same thing I tried, but apparently the version of MySQL(5.0.45) I'm stuck with doesn't support LIMIT in subqueries.
Jason Roberts
A: 

I would say you'd have to use multiple queries and loops for something like this.
There may be a way, but its beyond the time I have :)
Here's some suedo code to show how I'd accomplish this

select project_id, project_title from projects
select project_id, count(*) As num_tasks from tasks group by project_id
select task_id, count(*) As num_comment from comments group by task_id

foreach (int projectId in projects.Rows)
{
    select task_id, task_title from tasks where project_id = projectID limit 3
    foreach (int taskID in tasks.Rows)
    {
        select comment_id, comment from comments limit 3
    }
}
keith