views:

22

answers:

2

Django's ORM (version 1.2.3) does not preserve identity when following foreign keys back and forth. This is best explained with an example:

class Parent(models.Model):
    pass

class Child(models.Model):
    parent = models.ForeignKey(Parent)

parent = Parents.objects.get(id=1)
for child in parent.child_set.all():
    print id(child.parent), "=!", id(parent)

So, for each child the parent is re-fetched from the database, even though we know the parent at the moment we fetch the child. This is counterintuitive to me.

In my case this also leads to performance issues, since I do some heavy operations at the parent level which I'd like to cache at object instance level. However, since the results of these calculations are accessed via the child => parent link, this caching at the parent level is useless.

Any ideas on how to solve this?

I've gotten as far as figuring out there's a ForeignRelatedObjectsDescriptor and a ReverseSingleRelatedObjectDescriptor.

A: 

Django's ORM does not follow "reverse" relationships. This means that every time you access child.parent it makes a new database call.

One way to solve this in some (but not all) situations is to filter Child objects and use select_related() while doing so. This will reduce the number or database calls as child and parent tables are joined at query execution time and no separate query is fired when child.parent is accessed.

For e.g.

from django.db import connection

parent = Parents.objects.get(id=1)
print parent
print len(connection.queries) # say, X

children = Child.objects.select_related().filter(parent = parent)
for child in children:
    print child.parent

print len(connection.queries) # should be X + 1

The Python object id of the parent and child.parent won't be the same but you will see that no additional queries are fired when you access child.parent.

Manoj Govindan
The trouble with this is you're still adding unnecessary work to the database - in this case, a JOIN - which, although not as expensive as a whole separate query, does add some weight. Also, the OP said he had done some expensive calculations on the `parent` object already - these would not carry over.
Daniel Roseman
You are correct. The calculation in particular make this a less than effective solution.
Manoj Govindan
As Daniel said: this wouldn't really solve my problem because of the expensive calculations in "parent".
Klaas van Schelven
+4  A: 

There are a number of possible solutions to this.

Perhaps the easiest is to keep track of the parent yourself:

parent = Parents.objects.get(id=1)
for child in parent.child_set.all():
    child._parent_cache = parent

_FOO_cache is the way Django keeps track of items fetched via ForeignKey, so if you pre-populate that object on the child with the parent you already have, Django won't fetch it again when you reference child.parent.

Alternatively, you could look into one of the third-party libraries that attempt to fix this - django-idmapper or django-selectreverse are two I know of.

Daniel Roseman
Thanks. I indeed implemented the "keeping track of the parent yourself" option before I noticed your reply. Both of the other solutions seem promising as well though, so I'll probably look into them next time I run into this.
Klaas van Schelven
+1 for _Foo_cache.
Manoj Govindan