views:

104

answers:

5

Consider the models:

class Author(models.Model):
 name = models.CharField(max_length=200, unique=True)

class Book(models.Model):
 pub_date = models.DateTimeField()
 author = models.ForeignKey(Author)

Now suppose I want to order all the books by, say, their pub_date. I would use order_by('pub_date'). But what if I want a list of all authors ordered according to who most recently published books?

It's really very simple when you think about it. It's essentially:

  • The author on top is the one who most recently published a book
  • The next one is the one who published books not as new as the first,
  • So on etc.

I could probably hack something together, but since this could grow big, I need to know that I'm doing it right.

Help appreciated!

Edit: Lastly, would the option of just adding a new field to each one to show the date of the last book and just updating that the whole time be better?

A: 
 def remove_duplicates(seq): 
    seen = {}
    result = []
    for item in seq:
        if item in seen: continue
        seen[item] = 1
        result.append(item)
    return result


# Get the authors of the most recent books
query_result = Books.objects.order_by('pub_date').values('author')
# Strip the keys from the result set and remove duplicate authors
recent_authors = remove_duplicates(query_result.values())
DevDevDev
Hey, looks like what I need. Just if I may ask:1. How does it work and2. How is it in terms of performance?
Adi
Hmm actually sorry I dn't hink this will work, the .distinct() call will interact with the list, hang on I am fixing it for you.
DevDevDev
There may be a better way, I could do this in SQL so I know you can do it using Django models but I can't think how off the top of my head.
DevDevDev
A: 

Or, you could play around with something like this:

Author.objects.filter(book__pub_date__isnull=False).order_by('-book__pub_date')

ayaz
Won't work, there may be many books with the same author.
Dmitry Risenberg
I am sorry, but why won't it work, regardless of whether an author has no books, one book, or more?
ayaz
A: 

from django.db.models import Max Author.objects.annotate(max_pub_date=Max('books__pub_date')).order_by('-max_pub_date')

this requires that you use django 1.1

and i assumed you will add a 'related_name' to your author field in Book model, so it will be called by Author.books instead of Author.book_set. its much more readable.

Ofri Raviv
A: 

Building on ayaz's solution, what about: Author.objects.filter(book__pub_date__isnull=False).distinct().order_by('-book__pub_date')

Josh Ourisman
+1  A: 

Lastly, would the option of just adding a new field to each one to show the date of the last book and just updating that the whole time be better?

Actually it would! This is a normal denormalization practice and can be done like this:

class Author(models.Model):
    name = models.CharField(max_length=200, unique=True)
    latest_pub_date = models.DateTimeField(null=True, blank=True)

    def update_pub_date(self):
        try:
            self.latest_pub_date = self.book_set.order_by('-pub_date')[0]
            self.save()
        except IndexError:
            pass # no books yet!

class Book(models.Model):
    pub_date = models.DateTimeField()
    author = models.ForeignKey(Author)

    def save(self, **kwargs):
        super(Book, self).save(**kwargs)
        self.author.update_pub_date()

    def delete(self):
        super(Book, self).delete()
        self.author.update_pub_date()

This is the third common option you have besides two already suggested:

  • doing it in SQL with a join and grouping
  • getting all the books to Python side and remove duplicates

Both these options choose to compute pub_dates from a normalized data at the time when you read them. Denormalization does this computation for each author at the time when you write new data. The idea is that most web apps do reads most often than writes so this approach is preferable.

One of the perceived downsides of this is that basically you have the same data in different places and it requires you to keep it in sync. It horrifies database people to death usually :-). But this is usually not a problem until you use your ORM model to work with dat (which you probably do anyway). In Django it's the app that controls the database, not the other way around.

Another (more realistic) downside is that with the naive code that I've shown massive books update may be way slower since they ping authors for updating their data on each update no matter what. This is usually solved by having a flag to temporarily disable calling update_pub_date and calling it manually afterwards. Basically, denormalized data requires more maintenance than normalized.

isagalaev