views:

489

answers:

5

I have a function that looks like this:

def post_count(self):
        return self.thread_set.aggregate(num_posts=Count('post'))['num_posts']

I only want to count posts that have their status marked as 'active'. Is there an easy way to add a filter before the Count function?

Model Definitions:

class Category(models.Model):
    name = models.CharField(max_length=100)
    slug = models.SlugField(max_length=100, blank=True, primary_key=True)
    ordering = models.IntegerField(max_length=3, default=0)

    @property
    def thread_count(self):
        return self.thread_set.all().count()

    @property
    def post_count(self):
        return self.thread_set.aggregate(num_posts=Count('post'))['num_posts']

class Thread(models.Model):
    user = models.ForeignKey(User)
    category = models.ForeignKey(Category)
    title = models.CharField(max_length=100)
    slug = models.SlugField(max_length=100)
    content = models.TextField()
    created = models.DateTimeField(auto_now_add=True)
    latest_activity = models.DateTimeField(auto_now_add=True)

class Post(models.Model):
    thread = models.ForeignKey(Thread)
    parent = models.ForeignKey('Post', null=True, blank=True)
    display_name = models.CharField(max_length=100)
    email = models.EmailField(db_index=True)
    ip_address = models.IPAddressField(null=True, blank=True)
    content = models.TextField()
    status = models.CharField(choices=STATUS_CHOICES, max_length=25, db_index=True, default='approved')
    created = models.DateTimeField()
A: 

Yes. Just do it. This should work as expected:

self.thread_set.filter(active_status=1).aggregate(num_posts=Count('post'))['num_posts']

Any original query returns a QuerySet, so any available methods that return QuerySets can be can be pretty much indefinitely chained together for complex criteria matches. Since aggregate() does not return a QuerySet, you want to make sure that it is last in the chain.

jathanism
I am pretty sure that would filter the thread set and not the posts within each thread.
Jason Christa
Whoops, you are right. You would need to get a full list of the posts for a thread, then filter out the active posts, then aggregate those. But the other stuff about having `aggregate()` last still stands. I will poke around and then edit my answer.
jathanism
I don't have a code example yet, but it seems like what you may be looking for is this: http://docs.djangoproject.com/en/dev/topics/db/aggregation/#joins-and-aggregates (Joins and Aggregates)
jathanism
A: 

I have been looking into something similar and have not found a great solution. I'm using something like this:

def post_count(self):
        return len(Post.objects.filter(someModel = self).filter(active_status = 1))

It's not great, but I don't think that Django allows you to filter based on the secondary model aggregations and annotations. I'll be checking to see if anyone comes up with a better solution.

Zach
Why not Post.objects.filter(someModel=self, active_status=1).count()?
Jason Christa
Also I am using aggregate here because I want more than just the post count on a particular thread(represented by someModel in your example), I need the post count for all threads in a category, which is why I am using thread_set.
Jason Christa
Yeah the example I gave is pretty sloppy. Your code is definitely cleaner.The point I was trying to make though is that I'm pretty sure you need to work through the Post model though, and not the Thread model.
Zach
If a work on the post model and I have manually sum all the threads in a category. So if a category had 1000 threads, it would take at least that many queries to the db. Obviously that would never be acceptable and I would just do a raw SQL instead.
Jason Christa
A: 

You may want to look at writing a custom Manager object:

http://docs.djangoproject.com/en/1.1/topics/db/managers/

I haven't used aggregate(), but that may let you write a custom manager to provide a filtered active_thread_set and then do self.active_thread_set.aggregate(...). If not, it will let you do the custom SQL and add a num_posts property onto the Thread objects (see the PollManager.with_counts() example.)

Tom
+3  A: 

OK, now that the question includes the model definitions, I submit to you that this should work, unless your version of Django doesn't support some feature I use here (in which case, please let me know!):

Post.objects.filter(thread__in=thread_set, status='active').aggregate(num_posts=Count('id'))

Django allows __in filters to take a QuerySet to decide what the IN clause should look like in SQL, so if you pass thread__in=thread_set, Django will filter the posts so that only those whose thread field points to one of the ids of the threads in your thread_set remain for the aggregate call to see.

This should filter the posts with just one db query with something like WHERE thread_id IN ... inside, rather than with one query per thread, which would indeed be horrid. If anything else happened, this would be a bug in Django...

The result should be at most two queries to establish a Category's postcount -- one to obtain thread_set and another one actually to count the posts. The alternative is to have a thread/post join to be filtered based on Thread's category field and Post's status field, which I wouldn't necessarily expect to be that much faster. (I say 'at most', because I guess they could be fused automatically... Though I don't think this would happen with current Django. Can't check ATM, sorry.)

EDIT: Django's QuerySet API reference says this on __in filters:


IN

In a given list.

Example:

Entry.objects.filter(id__in=[1, 3, 4])

SQL equivalent:

SELECT ... WHERE id IN (1, 3, 4);

You can also use a queryset to dynamically evaluate the list of values instead of providing a list of literal values:

inner_qs = Blog.objects.filter(name__contains='Cheddar')
entries = Entry.objects.filter(blog__in=inner_qs)

This queryset will be evaluated as subselect statement:

SELECT ... WHERE blog.id IN (SELECT id FROM ... WHERE NAME LIKE '%Cheddar%')

The above code fragment could also be written as follows:

inner_q = Blog.objects.filter(name__contains='Cheddar').values('pk').query
entries = Entry.objects.filter(blog__in=inner_q)

Changed in Django 1.1: In Django 1.0, only the latter piece of code is valid.

This second form is a bit less readable and unnatural to write, since it accesses the internal query attribute and requires a ValuesQuerySet. If your code doesn't require compatibility with Django 1.0, use the first form, passing in a queryset directly.


So, I guess Django is capable of passing a single query to the db in the case at issue here. If the db's query analyser does a good job, the effect might be very nearly optimal. :-)

Michał Marczyk
This is the best solution so far. It may also be the best you can do with the django orm. I was just hoping there was an easy way to add a filter to the original self.thread_set.aggregate(num_posts=Count('post'))['num_posts'].
Jason Christa
Well, I finally got round to looking up the docs on __in filters with inner QuerySets... Looks like they should perform quite well, no? :-) Pity I can't seem to find a way to plug a filter straightforwardly into your code, as I can see how it might be a more natural fit to a particular way of thinking of what the code is saying... (Though on the other hand, you are counting posts, so going through Post seems ok to me.) Hope it works out for you performance-wise, at any rate. All best!
Michał Marczyk
A: 

Would it be ok to change around things a bit?

As illustrated below, you could add a post_count property to the Thread class, which counts active Posts in a Thread.

This post_count could then be used to calculate active posts in a category by adding up all active posts in all thread in a category.

class Category(models.Model):
    name = models.CharField(max_length=100)
    slug = models.SlugField(max_length=100, blank=True, primary_key=True)
    ordering = models.IntegerField(max_length=3, default=0)

    @property
    def thread_count(self):
        return self.thread_set.all().count()

    @property
    def post_count(self): # <-- Changed
        return reduce(lambda x,y: x + y, [x.post_count for x in self.thread_set.all()])

class Thread(models.Model):
    user = models.ForeignKey(User)
    category = models.ForeignKey(Category)
    title = models.CharField(max_length=100)
    slug = models.SlugField(max_length=100)
    content = models.TextField()
    created = models.DateTimeField(auto_now_add=True)
    latest_activity = models.DateTimeField(auto_now_add=True)

    @property
    def post_count(self): # <---- Newly added
        return self.post_set.filter(status = 'ACTIVE').count()

class Post(models.Model):
    thread = models.ForeignKey(Thread)
    parent = models.ForeignKey('Post', null=True, blank=True)
    display_name = models.CharField(max_length=100)
    email = models.EmailField(db_index=True)
    ip_address = models.IPAddressField(null=True, blank=True)
    content = models.TextField()
    status = models.CharField(choices=STATUS_CHOICES, max_length=25, db_index=True, default='approved')
    created = models.DateTimeField()
Mayuresh