tags:

views:

51

answers:

1

I run a lab annotation website where users can annotate samples with tags relating to disease, tissue type, etc. Here is a simple example from models.py:

from django.contrib.auth.models import User
from django.db import models


class Sample(models.Model):
    name = models.CharField(max_length = 255)
    tags=models.ManyToManyField('Tag', through = 'Annot')

class Tag(models.Model):
    name = models.CharField(max_length = 255)

class Annot(models.Model):
    tag = models.ForeignKey('Tag')
    sample = models.ForeignKey('Sample')
    user = models.ForeignKey(User, null = True)
    date = models.DateField(auto_now_add = True)

I'm looking for a query in django's ORM which will return the tags in which two users agree on the annotation of same tag. It would be helpful if I could supply a list of users to limit my query (if someone only believes User1 and User2 and wants to find the sample/tag pairs that only they agree on.)

+1  A: 

I think I understood what you need. This one made me think, thanks! :-)

I believe the equivalent SQL query would be something like:

select t.name, s.name, count(user_id) count_of_users
  from yourapp_annot a, yourapp_tag t, yourapp_sample s 
 where a.tag_id = t.id 
   and s.id = a.sample_id
group by t.name, s.name
having count_of_users > 1

While I try hard not to think in SQL when I'm coming up with django model navigation (it tends to get in the way); when it comes to aggregation queries it always helps me to visualize what the SQL would be.

In django we now have aggregations.

Here is what I came up with:

models.Annot.objects.select_related().values(
  'tag__name','sample__name').annotate(
  count_of_users=Count('user__id')).filter(count_of_users__gt=1)

The result set will contain the tag, the sample, and the count of users that tagged said sample with said tag.

Breaking it apart for the folks that are not used to django aggregation:

models.Annot.objects.select_related()

  • select_related() is forcing all tables related to Annot to be retrieved in the same query
  • This is what will allow me to specify tag__name and sample__name in the values() call

values('tag__name','sample__name')

  • values() is limiting the fields to retrieve to tag.name and sample.name
  • This makes sure that my aggregation on count of clients will group by just these fields

annotate(count_of_users=Count('user__id'))

  • annotate() adds an aggregation as an extra field to a query

filter(count_of_users__gt=1)

  • And finally I filter on the aggregate count.

If you want to add an additional filter on what users should be taken into account, you need to do this:

models.Annot.objects.filter(user=[... list of users...]).select_related().values(
  'tag__name','sample__name').annotate(
  count_of_users=Count('user__id')).filter(count_of_users__gt=1)

I think that is it.


One thing... Notice that I used tag__name and sample__name in the query above. But your models do not specify that tag names and sample names are unique.

Should they be unique? Add a unique=True to the field definitions in the models.

Shouldn't they be unique? You need to replace tag__name and sample__name with tag__id and sample__id in the query above.

celopes
looks great ... with a great explanation, checking it now
JudoWill
works perfectly, couldn't have asked for anything better!
JudoWill
Happy it helped. Check my edit about uniqueness, please.
celopes
actually I changed it to:.values('tag','sample')which is equivalent to 'tag__id' and 'sample__id'
JudoWill