views:

208

answers:

3

I am not sure how to handle this in a BigTable datastore.

Imagine the following example (just to explain the concept. The example does not match my actual data model):

  • I have a Counter entity that keeps track of the number of Transactions in my dataStore. Let's say the current 'count' is 100.
  • Now two web requests read this value at the same time.
  • Both web requests add a new Transaction
  • And finally both update the counter (to 101).

The counter value is now inaccurate. It should be 102.

Any suggestions on how to handle this situation? Can I 'lock' the counter to ensure that the second web request doesn't even read it until the first web request completes?

+4  A: 

You have several options:

  • Depending on the scope of your counter and your entities, have the Transaction entities be child entities of the counter. Then, you can insert a transaction and update the counter transactionally. Bear in mind that this limits your update rate to about 1-5 QPS.
  • If your counts don't have to be 100% accurate, insert the entity and update the counter (using a single-entity transaction) separately. You can run a regular cronjob to re-count the number of entities and fix the counter if errors force it to be out of sync.
  • You could build your own limited distributed transaction support.
Nick Johnson
Thanks Nick. When you say "insert a transaction", do you mean executing a function in db.run_in_transaction? When I do a "read" in my transactional function, will it 'lock' the object and throw an error if another thread tries to access a value that is now "outdated"? Thanks, I'm still new to the way bigTable handles transactions :)
willem
Well, you called your entities "Transactions", so when I said "insert a transaction", I meant "insert a 'transaction' entity". Reads inside transactions are transactional, though, yes - only with optimistic concurrency rather than locks.
Nick Johnson
+1  A: 

In addition to the options Nick gives, you could consider sharding the counter.

Keep multiple counters, and pick one to update in such a way that it is (ideally) impossible or (failing that) unlikely that any two requests will simultaneously pick the same shard.

You then have further options. You could do a transaction with the shard as parent (this reduces contention compared with a single counter), although you'll end up with your new Transaction entity having a parent chosen arbitrarily. Or don't bother with a transaction, in which case you'll probably have to fix the count from time to time, as with Nick's non-transaction option.

To read the total count, you add up all the shards. You won't be reading them all "at the same time", but that's usually fine. Reading any counter, it might increase between when you read it, and when you use the value, so the value is really just a lower bound. Adding up the shards is no different, except that it probably takes longer.

Steve Jessop
A: 

can spring @transactional on service layer help with this? looking forward to hear comment from you all

cometta