views:

78

answers:

3

Hi folks,

I'm trying to compute the average of a field over various subsets of a queryset.

Player.objects.order_by('-score').filter(sex='male').aggregate(Avg('level'))

This works perfectly!


But... if I try to compute it for the top 50 players it does not work.

Player.objects.order_by('-score').filter(sex='male')[:50].aggregate(Avg('level'))

This last one returns the exact same result as the query above it, which is wrong.


What am I doing wrong?

Help would be very much appreciated!

+2  A: 
topfifty = Player.objects.order_by('-score')[:50]
Player.objects.filter(sex='male',pk__in=topfifty).aggregate(avglevel=Avg('level'))

edit: i haven't tested this, but i think you get the idea of where i'm going.

topfifty = Player.objects.order_by('-score')[:50]
ids = []
for t in topfifty:
    ids += [t.id]

Player.objects.filter(sex='male',pk__in=ids).aggregate(avglevel=Avg('level'))

it's kind of hackish but the best i can think of. perhaps consider filtering for male and taking the top 50 males instead of this which takes the top fifty and then filters out the males.

Brandon H
@Brandon: I'm confused... that returns the average level for each player? Each player has 1 level field, I'm trying to compute the average over a group of players! Anyway thanks for the answer. :)
RadiantHex
ah i will adjust accordingly.
Brandon H
@Brandon: thanks!! Worked perfectly!! :D
RadiantHex
sweet. i hope you click the checkmark.
Brandon H
@Brandon: I had a slight problem with the implementation. Can you edit **topfifty** to become a list of IDs? I promise I'll tick it after that... I wouldn't have been able to come up with that :)
RadiantHex
hmm i thought it was supposed to work the way i had it before. this is all i can think of to extract a list of ids.
Brandon H
@Brandon: thanks you made my day! :)
RadiantHex
You can skip that for loop and just pass in the target query set: `...filter(sex='male', pk__in=topfifty)`. Easier to read and you hold off on hitting the DB until the aggregate() call.
istruble
that's what i had before, but he had an issue and needed the list of ids. i'm sure there's a better way to get the ids, but that's what i could think of from memory.
Brandon H
Gotcha, evolving question leading to an evolving answer. I was just trying to suggest skipping the list of ids entirely.
istruble
A: 

Hmm. The docs say

"Slicing. As explained in Limiting QuerySets, a QuerySet can be sliced, using Python's array-slicing syntax. Usually slicing a QuerySet returns another (unevaluated) QuerySet, but Django will execute the database query if you use the "step" parameter of slice syntax." http://docs.djangoproject.com/en/dev/ref/models/querysets/

So I'd try

Player.objects.order_by('-score').filter(sex='male')[0:50:1].aggregate(Avg('level'))
Vicki Laidler
@Vicki: thanks. Unfortunately it returns **'list' object has no attribute 'aggregate'** :) Would have been awesome if it worked!
RadiantHex
+3  A: 

Break the problem up into two logical steps; figure out the target set, perform the aggregate calculation.

top50_male_players = Player.objects.filter(sex='male').order_by('-score')[:50]
result = Player.objects.filter(pk__in=top50_male_players).aggregate(Avg('level'))

Thanks to lazy evaluation of the QuerySet it will be performed in a single DB operation. You can play with this in the shell to verify the query counts.

> from django.db import connection
> connection.queries = []
> top50_male_players = Player.objects.filter(sex='male').order_by('-score')[:50]
> len(connection.queries)
0
> result = Player.objects.filter(pk__in=top50_male_players).aggregate(Avg('level'))
> len(connection.queries)
1
> result
{'level__avg': <some number>}
> len(connection.queries)
1
istruble
@istruble: thank for your answer! unfortunately I don't know why **pk__in=queryset** is not returning the right results. But does indeed work if I build an array of IDs instead.
RadiantHex
That is very strange. Are you using anything other than filter() order_by() and []/slice to come up with your `queryset` value? I'll take another look if you paste the line you use to create `queryset` in another comment.
istruble