views:

52

answers:

2

For all my data in the GAE Datastore I have a model for keeping track of counters/total number of records (since we can't use traditional SUM queries). I want to know the most efficient way of incrementing these global count values whenever I insert/delete a record. This is what I'm currently doing:

counter = DBCounter.all().fetch(1)
dbc = DBCounter(totalTopics=counter[0].totalTopics+1)
dbc.put()

But this seems quite sloppy to me. Any thoughts on a better way to do this?

+2  A: 

If you need to keep scalability while counting, you should look into Joe Gregorio's article on sharding counters and DocSavage's implementation of the idea.

AppEngineFan's excellent blog also has info on scalable non-sharded counters, see this one which uses task queues and points to the previous article on using cron jobs instead.

Alex Martelli
+3  A: 

There are a few issues with your approach:

  • It may under-count since you don't use a transaction to atomically update the counter.
  • It is inefficient:
    • Contention may become a problem if you need to update this counter frequently. Since you only have one counter, it won't scale well. Datastore entities can only be written at a rate of at most 5 times per second.
    • You're writing to the datastore twice every time you insert a record. If you end up using transactions to fix the above problem, then you'll be making two round-trips to the datastore every time you insert the record (once to insert, and once to update the counter). You might be able to use an approach which avoids this extra round-trip to the datastore.

Here are some alternate approaches (from least accurate [and fastest] to most accurate [and slowest]):

  • If you only need a rough count of the number of entities of particular kind in the datastore, then you can use the Stats API. The counts you retrieve are not constantly updated, however.
  • If you need more granularity but are okay with a small possibility of occasionally under-counting, then you could use a memcache-enhanced counter. There are several good implementations discussed in this question. In particular, see the code in the comments in this recipe.
  • If you really want to avoid undercounting, then you should consider a sharded datastore counter. This will eliminate the contention issue from above.
David Underhill
Excellent tips. Many thanks.
GivP