views:

465

answers:

2

Projectfundingdetail has a foreign key to project.

The following query gives me the list of all projects that have any projectfundingdetail under 1000. How do I limit it to latest projectfundingdetail only.

projects_list.filter(projectfundingdetail__budget__lte=1000).distinct()

I have defined the following function,

def latest_funding(self):
    return self.projectfundingdetail_set.latest(field_name='end_date')

But I cant use the following as latest_funding is not a database field

projects_list.filter(latest_funding__budget__lte=1000).distinct()

So what query should I use to get all projects that have only their latest projectfundingdetail under 1000.

A: 

Django QuerySets can be chained together. You can do something like

projects_list.filter(projectfundingdetail__budget__lte=1000).distinct().latest('end_date')
abhinavg
I don't think this meets the criteria. OP wants latest Projectfundingdetail _per-project_ (if budget < 1000); this query gives the latest Project. In fact you've "moved" the end_date field from Projectfundingdetail to Project.
Carl Meyer
@Carl Meyer: What you said is right. Its a wrong query.
Lakshman Prasad
+2  A: 

This query is harder than it looks at first glance. AFAIK the Django ORM does not provide any way to generate efficient SQL for this query, because the efficient SQL requires a correlated subquery. (I'd love to be corrected on this!) You can generate some ugly SQL with this query:

Projectfundingdetail.objects.annotate(latest=Max('project__projectfundingdetail__end_date')).filter(end_date=F('latest')).filter(budget__lte==1000).select_related()

But this requires to join from Projectfundingdetail to Project and back again, which is inefficient (though perhaps adequate for your needs).

The other way to do this is to write raw SQL and encapsulate it in a manager method. It looks a little bit scary but works great. If you assign the manager as "objects" attribute on Projectfundingdetail, you can use it like this to get the latest funding details for each project:

>>> Projectfundingdetail.objects.latest_by_project()

And it returns a normal QuerySet, so you can add on further filters:

>>> Projectfundingdetail.objects.latest_by_project().filter(budget__lte=1000)

Here's the code:

from django.db import connection, models
qn = connection.ops.quote_name

class ProjectfundingdetailManager(models.Manager):
    def latest_by_project(self):
        project_model = self.model._meta.get_field('project').rel.to

        names = {'project': qn(project_model._meta.db_table),
                 'pfd': qn(self.model._meta.db_table),
                 'end_date': qn(self.model._meta.get_field('end_date').column),
                 'project_id': qn(self.model._meta.get_field('project').column),
                 'pk': qn(self.model._meta.pk.column),
                 'p_pk': qn(project_model._meta.pk.column)}

        sql = """SELECT pfd.%(pk)s FROM %(project)s AS p 
                 JOIN %(pfd)s AS pfd ON p.%(p_pk)s = pfd.%(project_id)s
                 WHERE pfd.%(end_date)s =
                     (SELECT MAX(%(end_date)s) FROM %(pfd)s 
                      WHERE %(project_id)s = p.%(p_pk)s)
              """ % names

        cursor = connection.cursor()
        cursor.execute(sql)
        return self.model.objects.filter(id__in=[r[0] for r
                                                 in cursor.fetchall()])

About half of that code (the "names" dictionary) is only necessary to be robust against the possibility of nonstandard database table and column names. You could also just hardcode the table and column names into the SQL if you're confident they won't ever change.

Carl Meyer
Thanks so much. I knew it was going to be complicated, thanks for validating it. Even for the initial multiple joining query, the annotate() function is only available in the trunk. Not in any tagged release.
Lakshman Prasad
Otherwise, I could simply add is_latest field in the model that stores if the project is latest and include is_latest=true as another db filter condition. How good (or bad) is this approach.
Lakshman Prasad
annotate() will be available in upcoming 1.1 release (already available in tagged 1.1beta). The raw-SQL approach is, I believe, fully 1.0-compatible. And yes, the denormalization approach would be another good option to consider, depending on your balance of writes vs reads and your performance needs in each case. No way to make a good decision without actually benchmarking your use cases.
Carl Meyer