tags:

views:

735

answers:

3

Hi, Im trying to select a table with multiple joins, one for the number of comments using COUNT and one to select the total vote value using SUM, the problem is that the two joins affect each other, instead of showing:

3 votes 2 comments

I get 3 * 2 = 6 votes and 2 * 3 comments

This is the query I'm using:

SELECT t.*, COUNT(c.id) as comments, COALESCE(SUM(v.vote), 0) as votes
FROM (topics t)
LEFT JOIN comments c ON c.topic_id = t.id
LEFT JOIN votes v ON v.topic_id = t.id
WHERE t.id = 9
A: 
SELECT t.*, COUNT(DISTINCT c.id) as comments, COALESCE(SUM(v.vote), 0) as votes
FROM (topics t)
LEFT JOIN comments c ON c.topic_id = t.id
LEFT JOIN votes v ON v.topic_id = t.id
WHERE t.id = 9
Dave Markle
I now get the correct number of comments but I still get an incorrect sum of the votes, I tried adding DISTINCT to v.vote too but that didn't work.
Dennis
A: 
SELECT t.*, COUNT(c.id) as comments, COALESCE(SUM(v.vote), 0) as votes
FROM (topics t)
LEFT JOIN comments c ON c.topic_id = t.id
LEFT JOIN votes v ON v.topic_id = t.id
WHERE t.id = 9
GROUP BY t.id

or perhaps

SELECT `topics`.*,
(
    SELECT COUNT(*)
    FROM `comments`
    WHERE `topic_id` = `topics`.`id`
) AS `num_comments`,
(
    SELECT IFNULL(SUM(`vote`), 0)
    FROM `votes`
    WHERE `topic_id` = `topics`.`id`
) AS `vote_total`
FROM `topics`
WHERE `id` = 9
chaos
+2  A: 

What you're doing is an SQL antipattern that I call Goldberg Machine. Why make the problem so much harder by forcing it to be done in a single SQL query?

Here is how I would really solve this problem:

SELECT t.*, COUNT(c.id) as comments
FROM topics t
LEFT JOIN comments c ON c.topic_id = t.id
WHERE t.id = 9;

SELECT t.*, SUM(v.vote) as votes
FROM topics t
LEFT JOIN votes v ON v.topic_id = t.id
WHERE t.id = 9;

As you have found, combining these two into one query results in a Cartesian product. There may be clever and subtle ways to force it to give you the correct answer in one query, but what happens when you need a third statistic? It's much simpler to do it in two queries.

Bill Karwin
Won't that result in more loading and more work for the database or is it harder for the database to work with multiple joins then to separate them?
Dennis
Most databases have internal caching for data and index pages, so it may not be a problem. Also the work that's necessary to compensate for the Cartesian product may be worse. Partly it depends on your specific system and your data, so the most accurate answer requires testing in *your* environment.
Bill Karwin