views:

749

answers:

3

The objective is to reduce the CPU cost and response time for a piece of code that runs very often and must db.get() several hundred keys each time.

Does this even work?

Can I expect the API time of a db.get() with several hundred keys to reduce roughly linearly as I reduce the size of the entity? Currently the entity has the following data attached: 9 String, 9 Boolean, 8 Integer, 1 GeoPt, 2 DateTime, 1 Text (avg size ~100 bytes FWIW), 1 Reference, 1 StringList (avg size 500 bytes). The goal is to move the vast majority of this data to related classes so that the core fetch of the main model will be quick.

If it does work, how is it implemented?

After a refactor, will I still incur the same high cost fetching existing entities? The documentation says that all properties of a model are fetched simultaneously. Will the old unneeded properties still transfer over RPC on my dime and while users wait? In other words: if I want to reduce the load time of my entities, is it necessary to migrate the old entities to ones with the new definition? If so, is it sufficient to re-put() the entity, or must I save under a wholly new key?

Example

Consider:

class Thing(db.Model):
    text    = db.TextProperty()
    strings = db.StringListProperty()
    num     = db.IntegerProperty()

thing = Thing(key_name='thing1', text='x' * 10240,
      strings = ['y'*500 for i in range(10)], num=23)
thing.put()

Let's say I re-define Thing to be streamlined and push up a new version:

class Thing(db.Model):
    num = db.IntegerProperty()

And I fetch it again:

thing_again = Thing.get_by_key_name('thing1')

Have I reduced the fetch time for this entity?

A: 

if I want to reduce the size of my entities, is it necessary to migrate the old entities to ones with the new definition?

Yes. The GAE data store is just a big key-value store, that doesn't know anything about your model definitions. So the old values will be the old values until you put new values in!

Jonathan Feinberg
Thanks, Jonathan. Certainly the old values will remain. I have rephrased in response to your answer. What I want to know is, after changing the model definition, will I immediately see a speedup when fetching objects (because the underlying db.Model will no longer fetch those properties that were removed); or will I need to save a new entity with only the reduced set of properties to see a speedup when fetching that entity?
jhs
If you merely change the old model, then you will lose your ability to fetch instances created with the old model. You must create an entirely new model, and migrate the old ones to it. To answer your question: the cost of the fetch is the expense, not the cost of deserializing the the fetched blobby into an instance of your model.
Jonathan Feinberg
Thanks again, Jonathan. I know that the fetch is the expense, not deserializing. I have added an example to the question. (And now that I've done that I may just run some benchmarks to be more certain.)
jhs
The answer is still "You have not reduced the fetch time, since the old model is still stored in the table, and the whole thing is being sent over the wire."
Jonathan Feinberg
"If you merely change the old model, then you will lose your ability to fetch instances created with the old model." - this is completely untrue. You can still fetch entities from before the modification.
Nick Johnson
True enough... I meant to imply that the new properties won't automagically be there, and the old ones will be inaccessible! But it is true that you can simply remove properties and the old ones will still be there. Thanks for clarifying.
Jonathan Feinberg
+5  A: 

To answer your questions in order:

  • Yes, splitting up your model will reduce the fetch time, though probably not linearly. For a relatively small model like yours, the differences may not be huge. Large list properties are the leading cause of increased fetch time.
  • Old properties will still be transferred when you fetch an entity after the change to the model, because the datastore has no knowledge of models.
  • Also, however, deleted properties will still be stored even once you call .put(). Currently, there's two ways to eliminate the old properties: Replace all the existing entities with new ones, or use the lower-level api.datastore interface, which is dict-like and makes it easy to delete keys.
Nick Johnson
+1 for how to delete properties, I have been wondering about this for a day or two
Brandon Thomson
Thanks. I would have thought a model with nearly 30 properties including lists and texts would be, if not large then worthy of optimization.
jhs
30 properties and a 30 item list are more or less equivalent - so as you can imagine, it's possible to have much larger models. :)
Nick Johnson
Thanks again Nick! In my case, 30 properties, one of which is a 30 property StringList :)I will followup on the forum with the results of the "schema" refactor.
jhs
FYI, here is the forum [thread](http://groups.google.com/group/google-appengine/browse_thread/thread/5a00e44cc56ae0d6/901cdabfc1031c15 "Will reducing model size improve performance?") which jhs was referring to.
David Underhill
+1  A: 

To remove properties from an entity, you can change your Model to an Expando, and then use delattr. It's documented in the App Engine docs here:

http://code.google.com/intl/fr/appengine/articles/update_schema.html

Under the heading "Removing Deleted Properties from the Datastore"

Danny Tuppeny