views:

414

answers:

4

Hi!

I have one table called gallery. For each row in gallery there are several rows in the table picture. One picture belongs to one gallery. Then there is the table vote. There each row is an upvote or a downvote for a certain gallery. Here is the (simplified) structure:

gallery ( gallery_id )
picture ( picture_id, picture_gallery_ref )
vote ( vote_id, vote_value, vote_gallery_ref )

Now I want one query to give me the following information: All galleries with their own data fields and the number of pictures that are connected to the gallery and the sumarized value of the votes.

Here is my query, but due to the multiple joining the aggregated values are not the right ones. (At least when there is more than one row of either pictures or votes.)

SELECT 
  *, SUM( vote_value ) as score, COUNT( picture_id ) AS pictures
FROM 
  gallery
LEFT JOIN 
  vote
  ON gallery_id = vote_gallery_ref
LEFT JOIN 
  picture
  ON gallery_id = picture_gallery_ref
GROUP BY gallery_id

Because I have noticed that COUNT( DISTINCT picture_id ) gives me the correct number of pictures I tried this:

( SUM( vote_value ) / GREATEST( COUNT( DISTINCT picture_id ), 1 ) ) AS score

It works in this example, but what if there were more joins in one query?

Just want to know whether there is a better or more 'elegant' way this problem can be solved. Also I'd like to know whether my solution is MySQL-specific or standard SQL?

+1  A: 

This quote from William of Okham applies here:

Enita non sunt multiplicanda praeter necessitatem

(Latin for "entities are not to be multiplied beyond necessity").

You should reconsider why do you need this to be done in a single query? It's true that a single query has less overhead than multiple queries, but if the nature of that single query becomes too complex, both for you to develop, and for the RDBMS to execute, then run separate queries.

Bill Karwin
Seems to be the best advice since obviously there is no 'elegant' solution I was not aware of.
okoman
Bah, now I regret coming up with (and testing) my query below; since the best answer to your question is a non-answer
matt b
@matt b: It's worthwhile showing how complex the solution has to be, so one can decide with more confidence that it would be simpler to run separate queries.
Bill Karwin
@matt b Your solution might work but however it doesn't look more elegant than my hacked query above ;)
okoman
+2  A: 

Or just use subqueries...

I don't know if this is valid MySQL syntax, but you might be able to do something similar to:

SELECT
  gallery.*, a.score, b.pictures
LEFT JOIN
(
  select vote_gallery_ref, sum(vote_value) as score
  from vote
  group by vote_gallery_ref
) a ON gallery_id = vote_gallery_ref
LEFT JOIN 
(
  select picture_gallery_ref, count(picture_id) as pictures
  from picture
  group by picture_gallery_ref
) b ON gallery_id = picture_gallery_ref
Kev
A: 

As Bill Karwin said, doing this all within one query is pretty ugly.

But, if you have to do it, joining and selecting non-aggregate data with aggregate data requires joining against subqueries (I haven't used SQL that much in the past few years so I actually forgot the proper term for this).

Let's assume your gallery table has additional fields name and state:

select g.gallery_id, g.name, g.state, i.num_pictures, j.sum_vote_values
from gallery g
inner join (
  select g.gallery_id, count(p.picture_id) as 'num_pictures'
  from gallery g
  left join picture p on g.gallery_id = p.picture_gallery_ref
  group by g.gallery_id) as i on g.gallery_id = i.gallery_id
left join (
  select g.gallery_id, sum(v.vote_value) as 'sum_vote_values'
  from gallery g
  left join vote v on g.gallery_id = v.vote_gallery_ref
  group by g.gallery_id
) as j on g.gallery_id = j.gallery_id

This will yield a result set that looks like:

gallery_id, name, state, num_pictures, sum_vote_values
1, 'Gallery A', 'NJ', 4, 19
2, 'Gallery B', 'NY', 3, 32
3, 'Empty gallery', 'CT', 0,
matt b
+1  A: 

How often do you add/change vote records?

How often do you add/remove picture records?

How often do you run this query for these totals?

It might be better to create total fields on the gallery table (total_pictures, total_votes, total_vote_values).

When you add or remove a record on the picture table you also update the total on the gallery table. This could be done using triggers on the picture table to automatically update the gallery table. It could also be done using a transaction combining two SQL statements to update the picture table and the gallery table. When you add a record on the picture table increment the total_pictures field on the gallery table. When you delete a record on the picture table decrement the total_pictures field.

Similary when a vote record is added or removed or the vote_value changes you update the total_votes and total_vote_values fields. Adding a record increments the total_votes field and adds vote_values to total_vote_values. Deleting a record decrements the total_votes field and subtracts vote_values from total_vote_values. Updating vote_values on a vote record should also update total_vote_values with the difference (subtract old value, add new value).

Your query now becomes trivial - it's just a straightforward query from the gallery table. But this is at the expense of more complex updates to the picture and vote tables.

Paul Morgan