views:

355

answers:

3

I have a two models:

class Category(models.Model):
    pass

class Item(models.Model):
    cat = models.ForeignKey(Category)

I am trying to return all Categories for which all of that category's items belong to a given subset of item ids (fixed thanks). For example, all categories for which all of the items associated with that category have ids in the set [1,3,5].

How could this be done using Django's query syntax (as of 1.1 beta)? Ideally, all the work should be done in the database.

A: 

lets say you require all items to be in the following set:

allowable_items = set([1,3,4])

one bruteforce solution would be to check the item_set for every category as so:

categories_with_allowable_items = [
    category for category in 
    Category.objects.all() if
    set([item.id for item in category.item_set.all()]) <= allowable_items
]

but we don't really have to check all categories, as categories_with_allowable_items is always going to be a subset of the categories related to all items with ids in allowable_items... so that's all we have to check (and this should be faster):

categories_with_allowable_items = set([
    item.category for item in
    Item.objects.select_related('category').filter(pk__in=allowable_items) if
    set([siblingitem.id for siblingitem in item.category.item_set.all()]) <= allowable_items
])

if performance isn't really an issue, then the latter of these two (if not the former) should be fine. if these are very large tables, you might have to come up with a more sophisticated solution. also if you're using a particularly old version of python remember that you'll have to import the sets module

ozan
This would work, though ideally I am looking for a way to push all the work to the database.
Derek Dahmer
A: 

I've played around with this a bit. If QuerySet.extra() accepted a "having" parameter I think it would be possible to do it in the ORM with a bit of raw SQL in the HAVING clause. But it doesn't, so I think you'd have to write the whole query in raw SQL if you want the database doing the work.

EDIT:

This is the query that gets you part way there:

from django.db.models import Count
Category.objects.annotate(num_items=Count('item')).filter(num_items=...)

The problem is that for the query to work, "..." needs to be a correlated subquery that looks up, for each category, the number of its items in allowed_items. If .extra had a "having" argument, you'd do it like this:

Category.objects.annotate(num_items=Count('item')).extra(having="num_items=(SELECT COUNT(*) FROM app_item WHERE app_item.id in % AND app_item.cat_id = app_category.id)", having_params=[allowed_item_ids])
Carl Meyer
In Django 1.1, they have added an annotate() function that will let you use the result of aggregation functions in filters.For example, Category.objects.annotate(num_items=Count('item')).filter(num_items=2) would return the categories that have 2 items. (just tested it)
Derek Dahmer
I'm well aware of the new annotate functionality (spent quite a while playing with that very query trying to answer this question). What it doesn't let you do is the correlated subquery to determine what should be in place of 2 (the number of items in allowed_items that link to each category).
Carl Meyer
A: 
Category.objects.filter(item__id__in=[1, 3, 5])

Django creates the reverse relation ship on the model without the foreign key. You can filter on it by using its related name (usually just the model name lowercase but it can be manually overwritten), two underscores, and the field name you want to query on.

Jason Christa
True, but not the OP's question: he doesn't want Categories with *any* item ID in the set, but Categories for which *all* related items are in the item ID set.
AdamKG
In that case this might work:Category.objects.filter(item__id=1).filter(item__id=3).filter(item__id=5)
Jason Christa