views:

185

answers:

3

Long story short, what fields should I put after the GROUP BY clause ?

SELECT questions.question_id, questions.title, questions.content, questions.view_count, questions.posted_on, users.user_id, users.group_id, users.username, users.first_name, users.last_name COUNT(answers.answer_id) AS answer_count 
FROM (questions) 
JOIN answers ON questions.question_id = answers.question_id 
JOIN users ON questions.user_id = users.user_id 
WHERE `questions`.`publish` = 'Y' AND `questions`.`deleted_at` IS NULL AND `users`.`blocked` = 'N' 
GROUP BY questions.question_id

should I put every non-aggregated fields mentioned in the SELECT, or just one of them is fine ? (eg. just question_id) I am confused because in either way, the results are the same. What is the difference ?

Tutorials out there on the web all seems to give an example of using just two fields, one aggregated field and one normal field.

update: ok, it looks like I have to put all of them to get an accurate result. That brought up other questions: How accurate is accurate ? Wouldn't one do just fine ? How about the impact on performance ?

+1  A: 

You need to put all of the non-aggregated columns for accurate result.

Shamik
how accurate is accurate ? Wouldn't one do just fine ? How about the impact on performance ?
andyk
A: 

u must do: GROUP BY questions.question_id, questions.title, questions.content, questions.view_count, questions.posted_on, users.user_id, users.group_id, users.username, users.first_name, users.last_name

(ie. all of them)

or

u can have another inner join for the count so u just need to do one.

EDIT: here is an example of the second option (haven't tested it, but should work)

SELECT questions.question_id, questions.title, questions.content, questions.view_count, questions.posted_on, users.user_id, users.group_id, users.username, users.first_name, users.last_name, r.AN_ANSWER_COUNT 
FROM (questions q) 
JOIN answers ON questions.question_id = answers.question_id JOIN users ON questions.user_id = users.user_id 

left join (SELECT question_id, COUNT(a.answer_id) AS AN_ANSWER_COUNT
                   FROM answers a

        WHERE (your_condition)
        GROUP BY question_id)r 
      on  q.question_id = a.question_id

WHERE `questions`.`publish` = 'Y' AND `questions`.`deleted_at` IS NULL AND `users`.`blocked` = 'N'
waqasahmed
would you mind elaborating more on the inner join alternative ?
andyk
Tinkered with it (alias problems) and I've got it running, but it fetches a question record for every answer (11 questions * 36 answers to a total of 396 records). Or did I get that wrong ?
andyk
sorry... small mistake: where I wrote: on q.question_id = a.question_id it should be q.question_id = r.question_id
waqasahmed
+1  A: 

"You need to put all the non-agregated columns for accurate result"

True, but I would add that your put all the columns, in the order you want it to be grouped ( which could be important for you ?).

Edit : Accurate means that if you dont do so, your query will just crash. About performances, the more fields there is in your GROUP BY, the more your performances decrease, but it's not really a surprise

Clement Herreman