views:

173

answers:

2

I've been playing around with the new aggregation functionality in the Django ORM, and there's a class of problem I think should be possible, but I can't seem to get it to work. The type of query I'm trying to generate is described here.

So, let's say I have the following models -

class ContactGroup(models.Model):
    .... whatever ....

class Contact(models.Model):
    group = models.ForeignKey(ContactGroup)
    name = models.CharField(max_length=20)
    email = models.EmailField()
...

class Record(models.Model):
    contact = models.ForeignKey(Contact)
    group = models.ForeignKey(ContactGroup)
    record_date = models.DateTimeField(default=datetime.datetime.now)

    ... name, email, and other fields that are in Contact ...

So, each time a Contact is created or modified, a new Record is created that saves the information as it appears in the contact at that time, along with a timestamp. Now, I want a query that, for example, returns the most recent Record instance for every Contact associated to a ContactGroup. In pseudo-code:

group = ContactGroup.objects.get(...)
records_i_want = group.record_set.most_recent_record_for_every_contact()

Once I get this figured out, I just want to be able to throw a filter(record_date__lt=some_date) on the queryset, and get the information as it existed at some_date.

Anybody have any ideas?

edit: It seems I'm not really making myself clear. Using models like these, I want a way to do the following with pure django ORM (no extra()):

ContactGroup.record_set.extra(where=["history_date = (select max(history_date) from app_record r where r.id=app_record.id and r.history_date <= '2009-07-18')"])

Putting the subquery in the where clause is only one strategy for solving this problem, the others are pretty well covered by the first link I gave above. I know where-clause subselects are not possible without using extra(), but I thought perhaps one of the other ways was made possible by the new aggregation features.

A: 

It sounds like you want to keep records of changes to objects in Django.

Pro Django has a section in chapter 11 (Enhancing Applications) in which the author shows how to create a model that uses another model as a client that it tracks for inserts/deletes/updates.The model is generated dynamically from the client definition and relies on signals. The code shows most_recent() function but you could adapt this to obtain the object state on a particular date.

I assume it is the tracking in Django that is problematic, not the SQL to obtain this, right?

hughdbrown
Actually, I am using the HistoricalRecords app from Pro Django to track history - I didn't mention it just to keep it simple. That app lets you do what I'm asking about for single object instances, but not for sets of objects, as I am trying to do.
A: 

First of all, I'll point out that:

ContactGroup.record_set.extra(where=["history_date = (select max(history_date) from app_record r where r.id=app_record.id and r.history_date <= '2009-07-18')"])

will not get you the same effect as:

records_i_want = group.record_set.most_recent_record_for_every_contact()

The first query returns every record associated with a particular group (or associated with any of the contacts of a particular group) that has a record_date less than the date/ time specified in the extra. Run this on the shell and then do this to review the query django created:

from django.db import connection
connection.queries[-1]

which reveals:

'SELECT "contacts_record"."id", "contacts_record"."contact_id", "contacts_record"."group_id", "contacts_record"."record_date", "contacts_record"."name", "contacts_record"."email" FROM "contacts_record" WHERE "contacts_record"."group_id" = 1  AND record_date = (select max(record_date) from contacts_record r where r.id=contacts_record.id and r.record_date <= \'2009-07-18\')

Not exactly what you want, right?

Now the aggregation feature is used to retrieve aggregated data and not objects associated with aggregated data. So if you're trying to minimize number of queries executed using aggregation when trying to obtain *group.record_set.most_recent_record_for_every_contact()* you won't succeed.

Without using aggregation, you can get the most recent record for all contacts associated with a group using:

[x.record_set.all().order_by('-record_date')[0] for x in group.contact_set.all()]

Using aggregation, the closest I could get to that was:

group.record_set.values('contact').annotate(latest_date=Max('record_date'))

The latter returns a list of dictionaries like:

[{'contact': 1, 'latest_date': somedate }, {'contact': 2, 'latest_date': somedate }]

So one entry for for each contact in a given group and the latest record date associated with it.

Anyway, the minimum query number is probably 1 + # of contacts in a group. If you are interested obtaining the result using a single query, that is also possible, but you'll have to construct your models in a different way. But that's a totally different aspect of your problem.

I hope this will help you understand how to approach the problem using aggregation/ the regular ORM functions.

tarequeh