views:

55

answers:

2

I have the following model

class Plugin(models.Model):
    name = models.CharField(max_length=50)
    # more fields

which represents a plugin that can be downloaded from my site. To track downloads, I have

class Download(models.Model):
    plugin = models.ForiegnKey(Plugin)
    timestamp = models.DateTimeField(auto_now=True)

So to build a view showing plugins sorted by downloads, I have the following query:

# pbd is plugins by download - commented here to prevent scrolling
pbd = Plugin.objects.annotate(dl_total=Count('download')).order_by('-dl_total')

Which works, but is very slow. With only 1,000 plugins, the avg. response is 3.6 - 3.9 seconds (devserver with local PostgreSQL db), where a similar view with a much simpler query (sorting by plugin release date) takes 160 ms or so.

I'm looking for suggestions on how to optimize this query. I'd really prefer that the query return Plugin objects (as opposed to using values) since I'm sharing the same template for the other views (Plugins by rating, Plugins by release date, etc.), so the template is expecting Plugin objects - plus I'm not sure how I would get things like the absolute_url without a reference to the plugin object.

Or, is my whole approach doomed to failure? Is there a better way to track downloads? I ultimately want to provide users some nice download statistics for the plugins they've uploaded - like downloads per day/week/month. Will I have to calculate and cache Downloads at some point?

EDIT: In my test dataset, there are somewhere between 10-20 Download instances per Plugin - in production I expect this number would be much higher for many of the plugins.

A: 

Annotations are obviously slow, as they need to update every record in the db.

One direct way would be to denormalize the db field. Use a download_count field on the plugin models that is incremented on the new save of Download. Use the sort by the aggregate query on Plugins.

If you think there are going to be too many downloads to update another record of the Plugin all the time, you can update the download_count field on the Plugin via a cron.

Lakshman Prasad
The downside to that approach is that I would lose the timestamps. This would allow me to track total downloads only.
Chris Lawlor
Chris, What I said was, override the save of the `Download` model, in which you can update the download count. You CAN HAZ BOTH!
Lakshman Prasad
A: 

That does seem unusually slow. There's nothing obvious in your query that would cause that slowness, though. I've done very similar queries in the past, with larger datasets, and they have executed in milliseconds.

The only suggestion I have for now is to install the Django debug toolbar, and in its SQL tab find the offending query and go to EXPLAIN to get the database to tell you exactly what it is doing when it executes. If it's doing subqueries, for example, check that they are using an index - if not, you may need to define one manually in the db. If you like, post the result of EXPLAIN here and I'll help further if possible.

Daniel Roseman
Interestingly, I tested the same code later on a sqlite DB and it was MUCH faster. I've lost the original PostgreSQL db so I guess I'll never find out what the problem was. Sometimes when I make changes to my models I'll just manually add or remove DB columns, usually without adding an index...
Chris Lawlor