views:

818

answers:

4

Long story short, I'm rewriting a piece of a system and am looking for a way to store some hit counters in AWS SimpleDB.

For those of you not familiar with SimpleDB, the (main) problem with storing counters is that the cloud propagation delay is often over a second. Our application currently gets ~1,500 hits per second. Not all those hits will map to the same key, but a ballpark figure might be around 5-10 updates to a key every second. This means that if we were to use a traditional update mechanism (read, increment, store), we would end up inadvertently dropping a significant number of hits.

One potential solution is to keep the counters in memcache, and using a cron task to push the data. The big problem with this is that it isn't the "right" way to do it. Memcache shouldn't really be used for persistent storage... after all, it's a caching layer. In addition, then we'll end up with issues when we do the push, making sure we delete the correct elements, and hoping that there is no contention for them as we're deleting them (which is very likely).

Another potential solution is to keep a local SQL database and write the counters there, updating our SimpleDB out-of-band every so many requests or running a cron task to push the data. This solves the syncing problem, as we can include timestamps to easily set boundaries for the SimpleDB pushes. Of course, there are still other issues, and though this might work with a decent amount of hacking, it doesn't seem like the most elegant solution.

Has anyone encountered a similar issue in their experience, or have any novel approaches? Any advice or ideas would be appreciated, even if they're not completely flushed out. I've been thinking about this one for a while, and could use some new perspectives.

+10  A: 

The existing SimpleDB API does not lend itself naturally to being a distributed counter. But it certainly can be done.

Working strictly within SimpleDB there are 2 ways to make it work. An easy method that requires something like a cron job to clean up. Or a much more complex technique that cleans as it goes.

The Easy Way

The easy way is to make a different item for each "hit". With a single attribute which is the key. Pump the domain(s) with counts quickly and easily. When you need to fetch the count (presumable much less often) you have to issue a query

SELECT count(*) FROM domain WHERE key='myKey'

Of course this will cause your domain(s) to grow unbounded and the queries will take longer and longer to execute over time. The solution is a summary record where you roll up all the counts collected so far for each key. It's just an item with attributes for the key {summary='myKey'} and a "Last-Updated" timestamp with granularity down to the millisecond. This also requires that you add the "timestamp" attribute to your "hit" items. The summary records don't need to be in the same domain. In fact, depending on your setup, they might best be kept in a separate domain. Either way you can use the key as the itemName and use GetAttributes instead of doing a SELECT.

Now getting the count is a two step process. You have to pull the summary record and also query for 'Timestamp' strictly greater than whatever the 'Last-Updated' time is in your summary record and add the two counts together.

SELECT count(*) FROM domain WHERE key='myKey' AND timestamp > '...'

You will also need a way to update your summary record periodically. You can do this on a schedule (every hour) or dynamically based on some other criteria (for example do it during regular processing whenever the query returns more than one page). Just make sure that when you update your summary record you base it on a time that is far enough in the past that you are past the eventual consistency window. 1 minute is more than safe.

This solution works in the face of concurrent updates because even if many summary records are written at the same time, they are all correct and whichever one wins will still be correct because the count and the 'Last-Updated' attribute will be consistent with each other.

This also works well across multiple domains even if you keep your summary records with the hit records, you can pull the summary records from all your domains simultaneously and then issue your queries to all domains in parallel. The reason to do this is if you need higher throughput for a key than what you can get from one domain.

This works well with caching. If your cache fails you have an authoritative backup.

The time will come where someone wants to go back and edit / remove / add a record that has an old 'Timestamp' value. You will have to update your summary record (for that domain) at that time or your counts will be off until you recompute that summary.

This will give you a count that is in sync with the data currently viewable within the consistency window. This won't give you a count that is accurate up to the millisecond.

The Hard Way

The other way way is to do the normal read - increment - store mechanism but also write a composite value that includes a version number along with your value. Where the version number you use is 1 greater than the version number of the value you are updating.

get(key) returns the attribute value="Ver015 Count089"

Here you retrieve a count of 89 that was stored as version 15. When you do an update you write a value like this:

put(key, value="Ver016 Count090")

The previous value is not removed and you end up with an audit trail of updates that are reminiscent of lamport clocks.

This requires you to do a few extra things.

  1. the ability to identify and resolve conflicts whenever you do a GET
  2. a simple version number isn't going to work you'll want to include a timestamp with resolution down to at least the millisecond and maybe a process ID as well.
  3. in practice you'll want your value to include the current version number and the version number of the value your update is based on to more easily resolve conflicts.
  4. you can't keep an infinite audit trail in one item so you'll need to issue delete's for older values as you go.

What you get with this technique is like a tree of divergent updates. you'll have one value and then all of a sudden multiple updates will occur and you will have a bunch of updates based off the same old value none of which know about each other.

When I say resolve conflicts at GET time I mean that if you read an item and the value looks like this:

      11 --- 12
     /
10 --- 11
     \
       11

You have to to be able to figure that the real value is 14. Which you can do if you include for each new value the version of the value(s) you are updating.

It shouldn't be rocket science

If all you want is a simple counter: this is way over-kill. It shouldn't be rocket science to make a simple counter. Which is why SimpleDB may not be the best choice for making simple counters.

That isn't the only way but most of those things will need to be done if you implement an SimpleDB solution in lieu of actually having a lock.

Don't get me wrong, I actually like this method precisely because there is no lock and the bound on the number of processes that can use this counter simultaneously is around 100. (because of the limit on the number of attributes in an item) And you can get beyond 100 with some changes.

Note

But if all these implementation details were hidden from you and you just had to call increment(key), it wouldn't be complex at all. With SimpleDB the client library is the key to making the complex things simple. But currently there are no publicly available libraries that implement this functionality (to my knowledge).

Mocky
Thanks for your in-depth response!The easy way - I had been thinking about this pattern, but the amount of transactions it would require is too great. Due to Amazon's SimpleDB pricing structure (paying by machine-hours), this would lead to a significant increase in cost.The hard way - Hm... this definitely seems better, although it seems like it would still end up being resource intensive on our instances.With our scale, having a lock is out of the question. Maybe one of these solutions is what we need.
Justin
Unless you are doing a **lot** of counter checking without caching, I would think that the easy way would result in fewer than 1/2 the transactions when compared with the hard way.
Mocky
The thing is that we don't want to be counting through 70,000 rows 1000 times (hits x users) every minute... and the whole point of this approach is to avoid storing the hits in a cache.
Justin
Justin: Check out Stephen McCarthy's response. SDB's (relatively new) conditional puts make this easier, perhaps even easier than "The Easy Way," depending on the API you are using. Ashley Tate also pointed this out.
The Alchemist
+2  A: 

Hi Justin

I see you've accepted an answer already, but this might count as a novel approach.

If you're building a web app then you can use Google's Analytics product to track page impressions (if the page to domain-item mapping fits) and then to use the Analytics API to periodically push that data up into the items themselves.

I haven't thought this through in detail so there may be holes. I'd actually be quite interested in your feedback on this approach given your experience in the area.

Thanks Scott

objektivs
Hey Scott, thanks for the idea. Using Google's stats does sound like an interesting idea. I haven't had too much experience with their Analytics products... For my use, I don't think this solution would work too well. We need our data to stay updated fast (max. delay of 5 minutes), and I don't think you can get that from Google (avoiding duplicate hits) without some hackery. The other (bigger) issue is that we have to keep legacy support, so unless I want to do some redirecting (which is more overhead and latency), that might be another issue.
Justin
+2  A: 

For anyone interested in how I ended up dealing with this... (slightly Java-specific)

I ended up using an EhCache on each servlet instance. I used the UUID as a key, and a Java AtomicInteger as the value. Periodically a thread iterates through the cache and pushes rows to a simpledb temp stats domain, as well as writing a row with the key to an invalidation domain (which fails silently if the key already exists). The thread also decrements the counter with the previous value, ensuring that we don't miss any hits while it was updating. A separate thread pings the simpledb invalidation domain, and rolls up the stats in the temporary domains (there are multiple rows to each key, since we're using ec2 instances), pushing it to the actual stats domain.

I've done a little load testing, and it seems to scale well. Locally I was able to handle about 500 hits/second before the load tester broke (not the servlets - hah), so if anything I think running on ec2 should only improve performance.

Justin
+2  A: 

To anyone revisiting this issue, Amazon just added support for Conditional Put's, which makes implementing a counter much easier.

Now, to implement a counter - simply call GetAttributes, increment the count, and then call PutAttributes, with the Expected Value set correctly. If Amazon responds with an error ConditionalCheckFailed, then retry the whole operation.

Note that you can only have one expected value per PutAttributes call. So, if you want to have multiple counters in a single row, then use a version attribute.

pseudo-code:

begin
  attributes = SimpleDB.GetAttributes
  initial_version = attributes[:version]
  attributes[:counter1] += 3
  attributes[:counter2] += 7
  attributes[:version] += 1
  SimpleDB.PutAttributes(attributes, :expected => {:version => initial_version})
rescue ConditionalCheckFailed
  retry
end
Stephen McCarthy
Given Amazon's time to get to "eventual consistency" -- does this method work well for counters? Or do you find that with a lot of concurrency you have a lot of contention and it takes a long time to updated the counter?
Hortitude
This method would probably not work well if the row is being written to frequently, since it will be retrying often. I have implemented it in production for an element that has low frequency writes and it works great though (We typically do all of our SDB PutAttribute calls in a background queue anyways though). You'll probably want to do some analysis on how often you get ConditionalCheckFailed errors before implementing this.
Stephen McCarthy
@Stephen: I've had really good success writing hundreds of items a second (via `BatchPutAttributes`) into a single domain, so unless the counter is incrementing that often, I don't think it'll cause any performance issues.
The Alchemist
Unfortunately, BatchPutAttributes doesn't support expected attributes, so you're limited to a much lower put frequency when using this method (approximately 25 calls to PutAttributes per second per domain).
Stephen McCarthy