views:

125

answers:

2

I read today about sharded counters in Google App Engine. The article says that you should expect to max out at about 5/updates per second per entity in the data store. But it seems to me that this solution doesn't 'scale' unless you have some way of knowing how many updates you are doing per second. For example, you can allocate 10 shards, but will then start choking at 50 updates per second.

So how do you know how fast the updates are coming, and how do you feed that number back into the number of shards?

My guess is that along with the counter you could keep some record of recent activity, and if you detect a spike you can increase the number of shards. Is that generally how it's done? And if so, why isn't it done in the sample code? (That last question may be unanswerable.) Is it more common practice to monitor website activity and update shard counts as traffic rises, as opposed to doing it automatically in the code?

Update: What are the practical consequences effects of having too few shards and choking? Does it simply mean that the website becomes unresponsive, or is it possible to lose counter updates because of timeouts?


As an aside, this question talks about implementing counters without sharding, but one of the answers impies that even memcache needs to be sharded if traffic is high. So this issue of shard allocation and tuning seems to be important.

+1  A: 

It is clearly simpler to manually monitor your website's popularity and increase the number of shards as needed. I would guess that most sites take this approach. Doing it programatically would not only be difficult, but it sounds like it would add an unacceptable amount of overhead to keep a record of all recent activity and try to analyze it to dynamically adjust the number of shards you're using.

I would prefer the simpler approach of just erring a little on the high side with the number of shards you choose.

You are correct about the practical consequences of having too few shards. Updating a datastore entity more frequently than possible which will initially cause some requests to take a long time (while the writes retry). If you have enough of them pile up, then they will start to fail as requests time out. This will certainly lead to missed counters. On the upside, your page will be so slow that users should start leaving which should relieve the pressure on the datastore :).

David Underhill
But but but .. if there have been timeouts, my counters will be wrong. I admit this will not result in any loss of life, but it bothers me just a little bit. Is it just one of those things that we have to live with?
brainjam
Living with the possibility of a few counter misses might not be so bad. Just try to choose the number of shards to accommodate your maximum expected peak traffic plus some safety margin. The more important missed counts are, the higher your safety margin should be.
David Underhill
+1  A: 

To address the last part of your question: Your memcache values will not require sharding. A single memcache server can handle tens of thousands of QPS of fetches and updates, so no plausibly large app is going to need to shard its memcache keys.

Nick Johnson
Excellent, thanks for the numbers!
David Underhill