views:

202

answers:

2

I have Django models setup in the following manner:

model A has a one-to-many relationship to model B

each record in A has between 3,000 to 15,000 records in B

What is the best way to construct a query that will retrieve the newest (greatest pk) record in B that corresponds to a record in A for each record in A? Is this something that I must use SQL for in lieu of the Django ORM?

A: 

I don't think Django ORM can do this (but I've been pleasantly surprised before...). If there's a reasonable number of A record (or if you're paging), I'd just add a method to A model that would return this 'newest' B record. If you want to get a lot of A records, each with it's own newest B, I'd drop to SQL.

remeber that no matter which route you take, you'll need a suitable composite index on B table, maybe adding an order_by=('a_fk','-id') to the Meta subclass

Javier
+1  A: 

Create a helper function for safely extracting the 'top' item from any queryset. I use this all over the place in my own Django apps.

def top_or_none(queryset):
    """Safely pulls off the top element in a queryset"""
    # Extracts a single element collection w/ top item
    result = queryset[0:1]

    # Return that element or None if there weren't any matches
    return result[0] if result else None

This uses a bit of a trick w/ the slice operator to add a limit clause onto your SQL.

Now use this function anywhere you need to get the 'top' item of a query set. In this case, you want to get the top B item for a given A where the B's are sorted by descending pk, as such:

latest = top_or_none(B.objects.filter(a=my_a).order_by('-pk'))

There's also the recently added 'Max' function in Django Aggregation which could help you get the max pk, but I don't like that solution in this case since it adds complexity.

P.S. I don't really like relying on the 'pk' field for this type of query as some RDBMSs don't guarantee that sequential pks is the same as logical creation order. If I have a table that I know I will need to query in this fashion, I usually have my own 'creation' datetime column that I can use to order by instead of pk.

Edit based on comment:

If you'd rather use queryset[0], you can modify the 'top_or_none' function thusly:

def top_or_none(queryset):
    """Safely pulls off the top element in a queryset"""
    try:
        return queryset[0]
    except IndexError:
        return None

I didn't propose this initially because I was under the impression that queryset[0] would pull back the entire result set, then take the 0th item. Apparently Django adds a 'LIMIT 1' in this scenario too, so it's a safe alternative to my slicing version.

Edit 2

Of course you can also take advantage of Django's related manager construct here and build the queryset through your 'A' object, depending on your preference:

latest = top_or_none(my_a.b_set.order_by('-pk'))
Joe Holloway
What is the difference between result = queryset[0:1]andresult = queryset[0]?
hekevintran
queryset[0:1] will return an empty list when there are no matching items, whereas queryset[0] will throw an IndexError.
Joe Holloway
Thanks for the answer!
hekevintran