views:

52

answers:

2

I'm thinking of dozens of concurrent jobs writing to the same datastore Model. Does the datastore scale regardless of the number of concurrent puts?

+2  A: 

The datastore can only handle so many writes per second to any given entity. Trying to write to a specific entity too quickly leads to contention as described in Avoiding datastore contention. This article recommends sharding an entity if you expect it be consistently updating it more than one or two times per second.

The datastore is optimized for reads, but if your concurrent jobs are writing to separate entities (even if they are within the same model) then your application might scale - it will depend on how long your request handlers take to execute.

David Underhill
In my case, I'm not actually touching the same entity. Every task is creating a new entity. So the contention would be at the higher - Kind - level. I'm continuously creating new and deleting old entities.
Greg
Continuously creating new and deleting old entities won't cause entity-level contention, though your writes could be slowed by contention on index updates. The following two articles go into the details of how writes are performed: [Life of a Datastore Write](http://code.google.com/appengine/articles/life_of_write.html) and [How Index Building Works](http://code.google.com/appengine/articles/index_building.html)
David Underhill
This answer is correct, except that contention is at the entity _group_ level, not the entity level.
Nick Johnson
+1  A: 

There is no contention for entity kinds - only for entity groups (entities with the same parent entity). Since you say you're writing to a new entity each time, you should be able to scale arbitrarially.

One subtlety remains, however: If you're inserting a high rate of entities (hundreds per second), and you're using the default auto-generated IDs, you can get 'hot tablets', which can cause contention. If you expect that high a rate of insertions, you should use key names, and select a key that doesn't cluster as auto generated IDs do - examples would be an email address, or a randomly generated UUID.

Nick Johnson
i'll experiment with the key names... right now, i'll do as many as 450 puts in a batch call.
Greg
Batch puts aren't an issue - it's the rate of put operations with sequential IDs that matters.
Nick Johnson
@nick just so I understand you correctly - do you mean batch puts don't suffer from the performance issue you describe in your answer? That would be consistent with the performance analysis I've done so far. Using random UUIDs does not improve the batch put call.
Greg
Correct. They're somewhat higher overhead than a put for a single entity, but nowhere near as bad as individual puts for every entity you're storing.
Nick Johnson