views:

41

answers:

2

Continuing on my last question...

Let me try to explain my schema. I have three tables we'll call users (with columns id and name), parties (with columns id, partydate, and user_id) and questions (with columns id, createdate, and user_id). My requirement is to show for every user the number of parties within the last year and questions created within the last year.

My query looks like:

SELECT users.id, users.name,  
  SUM(CASE WHEN (parties.party> NOW() - interval '1 year') THEN 1 ELSE 0 END) 
    AS numparties, 
  SUM(CASE WHEN (questions.createdate> NOW() - interval '1 year') THEN 1 ELSE 0 END)
    AS numquestions
FROM users
  LEFT JOIN parties ON users.id=parties.user_id
  LEFT JOIN questions ON users.id=questions.user_id
GROUP BY users.id, users.name;

This works almost 100%. I am getting a result with all users that exist. The problem is, for some users (a very small few) I'm counting either a party or a question twice. For example, if I change the above query to just show parties.id and questions.id instead of summing them as well as remove the GROUP BY, I might get something like:

user.id | user.name | parties.id | questions.id  
-----------------------------------------------
0          John          15             2
0          John          15             7

You can see it shows the parties.id twice.

When I was using COUNT() I could rely on DISTINCT but with SUM I'm not sure how I can. I want something like:

SUM(CASE WHEN (parties.party> NOW() - interval '1 year' AND parties.id IS DISTINCT) THEN 1 ELSE 0 END) 
AS numparties, 

But of course this isn't valid. Can this small problem be corrected easily?

+1  A: 

I won't write the code for you (since it's homework), but you'll want to put the two calculations into subqueries.

Here's a template:

  SELECT users.id, users.name, 
         subquery1.result_of_calculation1, subquery2.result_of_calculation2
    FROM users
         LEFT JOIN (
            --calculation 1
         ) subquery1
         ON users.id = subquery1.user_id
         LEFT JOIN (
            --calculation 2
         ) subquery2
         ON users.id = subquery2.user_id
GROUP BY users.id, users.name;
Adam Bernier
You mean the SUM() calculations? I've tried like this: SUM(SELECT 1 FROM parties WHERE paries.user_id=users.id AND parties.partydate > NOW() - interval '1 year') But I get a syntax error at the SELECT.
Airjoe
Yes. The SUM calculations can be left as they are, just moved into their own subqueries.
Adam Bernier
A: 

Following Adam's tip, I've come up with this:

SELECT users.id, users.name, 
  COALESCE(tparties.ecount,0),
  COALESCE(tquestions.pcount,0)
FROM users
  FULL JOIN (
    SELECT user_id,COUNT(parties.id) AS ecount 
    FROM parties 
    JOIN users ON parties.user_id = users.id 
      AND parties.partydate > NOW() - interval '1 year' 
    GROUP BY user_id) 
    as tparties ON users.id=tparties.user_id
  FULL JOIN (
    SELECT user_id,COUNT(questions.id) AS pcount 
    FROM questions JOIN users ON questions.user_id = users.id 
      AND questions.createdate > NOW() - interval '1 year' 
    GROUP BY user_id) 
    as tquestions ON users.id=tquestions.user_id
;

The good news is that all users are listed and that all the counts are right. The bad news is it ordered the date by user_id whereas the result that was given by the teacher to check our queries against is seemingly unordered. I take this to mean the above query is not the answer the teacher is looking for. However, the result is the same and considering the amount of time I've put into one problem, this is good enough for me. Thanks for the help.

Airjoe