Am building up a service that needs to maintain something along the lines of a case tracking system. Here's our model:
class Incident(models.Model):
title = models.CharField(max_length=128)
category = models.ForeignKey(Category)
status = models.ForeignKey(Status)
severity = models.ForeignKey(Severity)
owned_by = models.ForeignKey(User, related_name="owned_by", null=True, blank=True)
next_action = models.ForeignKey(IncidentAction)
created_date = models.DateTimeField()
created_by = models.ForeignKey(User, related_name="opened_by")
last_edit_date = models.DateTimeField(null=True, blank=True)
last_edit_by = models.ForeignKey(User, related_name="last_edit_by", null=True, blank=True)
closed_date = models.DateTimeField(null=True, blank=True)
closed_by = models.ForeignKey(User, related_name="Closed by", null=True, blank=True)
Because there are a lot of foreign keys being pulled into this model, it makes for interesting sql queries. We've been using as a trial the djblets data grid and the django debug toolbar, and are alarmed at the sheer number of queries being hit every time we add a new column for view that uses a foreign key, it does basically this type of query workflow:
#prepare the grid
select * from incident_table;
#render each row
for each row in incident table
for each column that is a foreign key select row from foreign table with id
It does an additional select query per row for each column that tries to pull a property for a foreign key.
I'm thinking that this is a universal problem with django and its ORM with regard to pulling in properties from foreign key models for display. As a test, I dropped the data grid and just did a simple list of properties for a queryset, and saw the queries balloon up in a similar way.
We're wanting to scale this up with tons of users hitting the model. As comparison, I made a similar view off the User model, and its full display is just done with one query because if you only pull fields from the given model, it doesn't do an additional db hit per additional column.
Some optimizations we tried were:
- django-orm-cache: Doesn't seem to work with django 1.0.4
- django-caching: This works well for caching the oft queried models
- view level caching with memcached
- Edit: using select_related() may speed up template rendering by not having a round trip back to the database, but it seems like it follows the foreign keys just ahead of time on the original queryset using the single query per foreign key. Just seems to move the multi database query hit ahead of time.
But there are some deeper questions we're soliciting the wisdom of the crowd on:
- For models with tons of foreign keys, what is the best way to make it efficiently query to get properties from foreign keys?
- Is caching the dependent models the only way to go using the above ORM caching systems?
- Or is this a standard case of outgrowing the ORM and needing to roll our own custom sql query with joins to get the desired datagrid output in as efficiently as possible?
Related questions that raised concerns on caching and foreign keys:
DB / performance: layout of django model that rarely refers to its parent more than once, Django ORM: caching and manipulating ForeignKey objects: