views:

602

answers:

5

I'm developing a simple web app, and it makes a lot of sense to store some denormalized data.

Imagine a blogging platform that keeps track of Comments, and the BlogEntry model has a "CommentCount" field that I'd like to keep up to date.

One way of doing this would be to use Django signals.

Another way of doing this would be to put hooks directly in my code that creates and destrys Comment objects to synchronously call some methods on BlogEntry to increment/decrement the comment count.

I suppose there are other pythonic ways of accomplishing this with decorators or some other voodoo.

What is the standard Design Pattern for denormalizing in Django? In practice, do you also have to write consistency checkers and data fixers in case of errors?

A: 

Why not just get the set of comments, and find the number of elements, using the count() method:

count = blog_entry.comment_set.count()

Then you can pass that into your template.

Or, alternative, in the template itself, you can do:

{{ blog_entry.comment_set.count }}

to get the number of comments.

mipadi
Every time I call count() it will do a "SELECT count(1) from Comment where ..." which will end up causing performance issues when there are a large number of comments.
slacy
How many people are leaving comments on your blog?
mipadi
+3  A: 

definitely use signals

Javier
Are signals "guaranteed"?
slacy
By documentation, there are some caveats for signals; failing to follows those guidelines will disable signals for specific models.More information on http://docs.djangoproject.com/en/dev/ref/signals/#module-django.db.models.signals
Roberto Liffredo
If your web server runs several processes, you should be aware of the fact, that signals won't notify other processes. That can lead to concurrency issues.
vikingosegundo
Yeah, Apache+WSGI will certainly run multiple processes, so that makes me nervous.
slacy
+2  A: 

The first approach (signals) has the advantage to loose the coupling between models.
However, signals are somehow more difficult to maintain, because dependencies are less explicit (at least, in my opinion).
If the correctness of the comment count is not so important, you could also think of a cron job that will update it every n minutes.

However, no matter the solution, denormalizing will make maintenance more difficult; for this reason I would try to avoid it as much as possible, resolving instead to using caches or other techniques -- for example, using with comments.count as cnt in templates may improve performance quite a lot.
Then, if everything else fails, and only in that case, think about what could be the best approach for the specific problem.

Roberto Liffredo
I understand the ins-and-outs of data normalization (and denormalization) but there are many cases where denormalized data can greatly increase query performance, which is why I'm thinking about it. My "comment count" example is synthetic, but serves as a good example for any denormalization proposal. Caching is a great idea, and I'll start to ponder that...
slacy
+10  A: 

You have managers in Django.

Use a customized manager to do creates and maintain the FK relationships.

The manager can update the counts as the sets of children are updated.

If you don't want to make customized managers, just extend the save method. Everything you want to do for denormalizing counts and sums can be done in save.

You don't need signals. Just extend save.

S.Lott
Great advice, that's what I did too
kender
I take this approach as well, haven't had problems so far.
Prairiedogg
Do you know of any good examples of this style? I'm amazed that the Django documentation (or Django Book) doesn't mention denormalization approaches at all...
slacy
+1  A: 

I found django-denorm to be useful. It uses database-level triggers instead of signals, but as far as I know, there is also branch based on different approach.

gorsky