views:

27

answers:

2

Hey guys, I've got a model that has an id that isn't unique. Each model also has a date. I would like to return all results but only the most recent of each row that shares ids. The model looks something like this:

class MyModel(models.Model):
    my_id = models.PositiveIntegerField()
    date  = models.DateTimeField()
    title = models.CharField(max_length=36)


## Add some entries
m1 = MyModel(my_id=1, date=yesterday, title='stop')
m1.save()

m2 = MyModel(my_id=1, date=today, title='go')
m2.save()

m3 = MyModel(my_id=2, date=today, title='hello')
m3.save()

Now try to retrieve these results:

MyModel.objects.all()... # then limit duplicate my_id's by most recent

Results should be only m2 and m3

+1  A: 

You won't be able to do this with just the ORM, you'll need to get all the records, and then discard the duplicates in Python.

For example:

objs = MyModel.objects.all().order_by("-date")
seen = set()
keep = []
for o in objs:
    if o.id not in seen:
        keep.append(o)
        seen.add(o.id)

Here's some custom SQL that can get what you want from the database:

select * from mymodel where (id, date) in (select id, max(date) from mymodel group by id)

You should be able to adapt this to use in the ORM.

Ned Batchelder
As soon as you loop on the results, won't that evaluate the QuerySet and cause all the lookups? There's really no way to do it without?
Scott Willman
Relational databases (and therefore ORMs built on them) are not good at operations between rows (including comparisons). Their model is fundamentally about selecting a set of rows, and then sorting them. I can't think of a way to get SQL to do what you want..
Ned Batchelder
Ok, thanks for taking the time. I suppose I'll limit the results in other ways (like only getting recent results) to reduce the weight. Thanks again, Ned!
Scott Willman
Wait: I added some custom SQL to the answer.
Ned Batchelder
Ooh thanks, I'll play with that a bit.
Scott Willman
A: 

You should also look into abstracting the logic above into a manager:

http://docs.djangoproject.com/en/dev/topics/db/managers/

That way you can call something like MyModel.objects.no_dupes() where you would define no_dupes() in a manager and do the logic Ned laid out in there.

Your models.py would now look like this:

class MyModelManager(models.Manager):
    def no_dupes:
        objs = MyModel.objects.all().order_by("-date")
        seen = set()
        keep = []
        for o in objs:
            if o.id not in seen:
                keep.append(o)
                seen.add(o.id)
        return keep

class MyModel(models.Model):
    my_id = models.PositiveIntegerField()
    date  = models.DateTimeField()
    title = models.CharField(max_length=36)
    objects = MyModelManager()

With the above code in place, you can call: MyModel.objects.no_dupes(), this should give your desired result. Looks like you can even override the all() function as well if you would want that instead:

http://docs.djangoproject.com/en/1.2/topics/db/managers/#modifying-initial-manager-querysets

I find the manager to be a better solution in case you will need to use this in more than one view across the project, this way you don't have to rewrite the code X number of times.

Sarkis Varozian
Whether I want to put the filtering in a custom manager or in the view, don't I still have to get ALL records and then filter them? I'd really like to filter before making the actual db calls if possible. Is this possible?
Scott Willman
You can modify the actual SQL query using managers. Look at the example here: http://docs.djangoproject.com/en/dev/topics/db/managers/#adding-extra-manager-methods
Sarkis Varozian
Thanks for the tip on Model Managers, I hadn't considered making a custom manager.
Scott Willman