Lastly, would the option of just adding a new field to each one to show the date of the last book and just updating that the whole time be better?
Actually it would! This is a normal denormalization practice and can be done like this:
class Author(models.Model):
name = models.CharField(max_length=200, unique=True)
latest_pub_date = models.DateTimeField(null=True, blank=True)
def update_pub_date(self):
try:
self.latest_pub_date = self.book_set.order_by('-pub_date')[0]
self.save()
except IndexError:
pass # no books yet!
class Book(models.Model):
pub_date = models.DateTimeField()
author = models.ForeignKey(Author)
def save(self, **kwargs):
super(Book, self).save(**kwargs)
self.author.update_pub_date()
def delete(self):
super(Book, self).delete()
self.author.update_pub_date()
This is the third common option you have besides two already suggested:
- doing it in SQL with a join and grouping
- getting all the books to Python side and remove duplicates
Both these options choose to compute pub_dates from a normalized data at the time when you read them. Denormalization does this computation for each author at the time when you write new data. The idea is that most web apps do reads most often than writes so this approach is preferable.
One of the perceived downsides of this is that basically you have the same data in different places and it requires you to keep it in sync. It horrifies database people to death usually :-). But this is usually not a problem until you use your ORM model to work with dat (which you probably do anyway). In Django it's the app that controls the database, not the other way around.
Another (more realistic) downside is that with the naive code that I've shown massive books update may be way slower since they ping authors for updating their data on each update no matter what. This is usually solved by having a flag to temporarily disable calling update_pub_date
and calling it manually afterwards. Basically, denormalized data requires more maintenance than normalized.