views:

126

answers:

2

I'm trying to figure out if there's a way to do a somewhat-complex aggregation in Django using its ORM, or if I'm going to have to use extra() to stick in some raw SQL.

Here are my object models (stripped to show just the essentials):

class Submission(Models.model)
  favorite_of = models.ManyToManyField(User, related_name="favorite_submissions")

class Response(Models.model)
  submission = models.ForeignKey(Submission)
  voted_up_by = models.ManyToManyField(User, related_name="voted_up_responses")

What I want to do is sum all the votes for a given submission: that is, all of the votes for any of its responses, and then also including the number of people who marked the submission as a favorite.

I have the first part working using the following code; this returns the total votes for all responses of each submission:

submission_list = Response.objects\
  .values('submission')\
  .annotate(votes=Count('voted_up_by'))\
  .filter(votes__gt=0)\
  .order_by('-votes')[:TOP_NUM]

(So after getting the vote total, I sort in descending order and return the top TOP_NUM submissions, to get a "best of" listing.)

That part works. Is there any way you can suggest to include the number of people who have favorited each submission in its votes? (I'd prefer to avoid extra() for portability, but I'm thinking it may be necessary, and I'm willing to use it.)

EDIT: I realized after reading the suggestions below that I should have been clearer in my description of the problem. The ideal solution would be one that allowed me to sort by total votes (the sum of voted_up_by and favorited) and then pick just the top few, all within the database. If that's not possible then I'm willing to load a few of the fields of each response and do the processing in Python; but since I'll be dealing with 100,000+ records, it'd be nice to avoid that overhead. (Also, to Adam and Dmitry: I'm sorry for the delay in responding!)

+1  A: 

One possibility would be to re-arrange your current query slightly. What if you tried something like the following:

submission_list = Response.objects\
    .annotate(votes=Count('voted_up_by'))\
    .filter(votes__gt=0)\
    .order_by('-votes')[:TOP_NUM]
submission_list.query.group_by = ['submission_id']

This will return a queryset of Response objects (objects with the same Submission will be lumped together). In order to access the related submission and/or the favorite_of list/count, you have two options:

num_votes = submission_list[0].votes
submission = submission_list[0].submission
num_favorite = submission.favorite_of.count()

or...

submissions = []
for response in submission_list:
    submission = response.submission
    submission.votes = response.votes
    submissions.append(submission)
num_votes = submissions[0].votes
submission = submissions[0]
num_favorite = submission.favorite_of.count()

Basically the first option has the benefit of still being a queryset, but you have to be sure to access the submission object in order to get any info about the submission (since each object in the queryset is technically a Response). The second option has the benefit of being a list of the submissions with both the favorite_of list as well as the votes, but it is no longer a queryset (so be sure you don't need to alter the query anymore afterwards).

Adam
A: 

You can count favorites in another query like

favorite_list = Submission.objects.annotate(favorites=Count(favorite_of))

After that you add the values from two lists:

total_votes = {}
for item in submission_list:
    total_votes[item.submission.id] = item.voted_by
for item in favorite_list:
    has_votes = total_votes.get(item.id, 0)
    total_votes[item.id] = has_votes + item.favorites

I am using ids in the dictionary because Submission objects will not be identical. If you need the Submissions themselves, you may use one more dictionary or store tuple (submission, votes) instead of just votes.

Added: this solution is better than the previous because you have only two DB requests.

Dmitry Risenberg