tags:

views:

32

answers:

3

I currently have two tables, one with documents, and another with ratings

doc_id | doc_groupid | doc_name | doc_time

and then

rating_id | rating_docid | rating_score

where rating_score is either -1 or 1.

What I need to do is have a single query that retrieves every column in the document table WHERE groupid = #, but also has columns which aggregate the ratings. I can retrieve a list of ratings using

  SELECT rating_docid,
         SUM(CASE WHEN rating_type = 1 THEN 1 ELSE 0 END ) AS UpVotes,
         SUM(CASE WHEN rating_type = -1 THEN 1 ELSE 0 END) AS DownVotes
GROUP BY rating_docid

Which gives me a list of documents (as long as they have been rated) and how many upvotes or downvotes they have. I can also obviously very easily get a list of documents with

SELECT * FROM documents WHERE doc_groupid = #

But I have no idea how to do this without a subquery (using JOIN or LEFT JOIN), which my understanding is too slow. Honestly, I have no idea how to do this with a subquery either.

So my question is:

  1. How can I do this with a speedy join?
  2. How can I do this with a subquery?

Thanks!

A: 

I guess you need something like

SELECT * 
FROM documents d
LEFT JOIN 
(
   SELECT rating_docid,
     SUM(CASE WHEN rating_type = 1 THEN 1 ELSE 0 END ) AS UpVotes,
     SUM(CASE WHEN rating_type = -1 THEN 1 ELSE 0 END) AS DownVotes
     FROM rating_table
     GROUP BY rating_docid
)r ON (r.rating_docid = d.doc_id)
WHERE d.doc_groupid = ....

Also, it will probably work faster if you change it to

 SELECT * 
FROM documents d
LEFT JOIN 
(
   SELECT rating_docid,
     SUM(CASE WHEN rating_type = 1 THEN 1 ELSE 0 END ) AS UpVotes,
     SUM(CASE WHEN rating_type = -1 THEN 1 ELSE 0 END) AS DownVotes
     FROM rating_table
     INNER JOIN documents d1 ON (d1.doc_id = rating_docid )
     WHERE d1.doc_groupid =...
     GROUP BY rating_docid
)r ON (r.rating_docid = d.doc_id)
WHERE d.doc_groupid = ....
a1ex07
A: 

Might look strange because of the two joins but, supposing you have your indexed your columns probably, should perform very well.

SELECT d.doc_id, d.doc_name, d.doc_time
       SUM(rd.rating_type) * -1 as DownVotes,
       SUM(ru.rating_type) as UpVotes
FROM documents d
    LEFT JOIN ratings rd ON d.doc_id = rd.rating_docid AND rd.rating_type < 0
    LEFT JOIN ratings ru ON d.doc_id = ru.rating_docid AND rd.rating_type > 0
GROUP BY d.doc_id

You might want to add a COALESCE http://dev.mysql.com/doc/refman/5.0/en/comparison-operators.html#function_coalesce to prevent the query returning NULL if nothing to join.

SELECT d.doc_id, 
       COALESCE(SUM(rd.rating_type), 0) * -1 as DownVotes,
       COALESCE(SUM(ru.rating_type), 0) as UpVotes
FROM documents d ...

I wouldn't recommend a subquery if you have many documents to check because for every document another query is executed which means a lot of overhead.

SchlaWiener
yep, thats why im trying to avoid subqueries. there can be as many as 100 documents on a single page, and with many pageviews it will add up quickly.my other backup plan is to just add the aggregate columns to the document table itself, and then manually update them from the ratings table when things change. but that seems so dirty.
Brad Huizenga
A: 

Use:

   SELECT d.doc_id,
          d.doc_name,
          d.doc_time, 
          COALESCE(SUM(CASE WHEN r.rating_type = 1 THEN 1 ELSE 0 END), 0) AS upvotes,
          COALESCE(SUM(CASE WHEN r.rating_type = -1 THEN 1 ELSE 0 END), 0) AS downvotes
     FROM DOCUMENTS d
LEFT JOIN RATINGS r ON r.rating_docid = d.doc_id
    WHERE d.doc_groupid = ?
 GROUP BY d.doc_id, d.doc_name, d.doc_time

The doc_time is odd to me, makes me think you can have duplicates but with different time values...

JOIN vs Subquery

JOINs (INNER and OUTER) are not subqueries. To make things more complicated, subqueries can mean:

  • a query in the SELECT clause (AKA sub-select):

    SELECT (SELECT col FROM TABLE) AS col2, ...
    
  • a query in the WHERE or HAVING clauses:

    WHERE col = (SELECT column FROM TABLE)
    HAVING col IN (SELECT cols FROM TABLE)
    
  • a query in the JOIN (AKA derived table, inline view):

    LEFT JOIN (SELECT u.user,
                      COUNT(*) AS num
                 FROM TABLE u
             GROUP BY u.user) x ON x.user = t.column
    

There's no hard'n'fast rule about one being better than the other because it all depends on:

  • table structure
  • data
  • indexing and table statistics
  • expected results

All that really matters is the work is done in as few passes over a table as necessary--ideally one.

OMG Ponies
thanks! this works great... now i just have to see if i can throw the current user's vote in there too :P probably with another join. i have no idea why i couldnt get this to work, so thanks a ton!
Brad Huizenga