views:

146

answers:

2

In general, it's better to do a single query vs. many queries for a given object. Let's say I have a bunch of 'son' objects each with a 'father'. I get all the 'son' objects:

sons = Son.all()

Then, I'd like to get all the fathers for that group of sons. I do:

father_keys = {}
for son in sons:
    father_keys.setdefault(son.father.key(), None)

Then I can do:

fathers = Father.get(father_keys.keys())

Now, this assumes that son.father.key() doesn't actually go fetch the object. Am I wrong on this? I have a bunch of code that assumes the object.related_object.key() doesn't actually fetch related_object from the datastore.

Am I doing this right?

+7  A: 

You can find the answer by studying the sources of appengine.ext.db in your download of the App Engine SDK sources -- and the answer is, no, there's no special-casing as you require: the __get__ method (line 2887 in the sources for the 1.3.0 SDK) of the ReferenceProperty descriptor gets invoked before knowing if .key() or anything else will later be invoked on the result, so it just doesn't get a chance to do the optimization you'd like.

However, see line 2929: method get_value_for_datastore does do exactly what you want!

Specifically, instead of son.father.key(), use Son.father.get_value_for_datastore(son) and you should be much happier as a result;-).

Alex Martelli
Kick ass answer. My app just got 10x faster and cheaper :D
Sudhir Jonathan
@Sudhir, glad it helped, thanks for letting me know it did!-)
Alex Martelli
Speaking of which, do objects get cached when they are retrieved from the store? So if i did Father.get(all_keys) in the handler, would the runtime try to get them again in the template if I'm just printing out the sons with their fathers?
Sudhir Jonathan
@Sudhir, caching is not automatic (the object in the store might after all be modified by another query between any two times it's used, how would the datastore invalidate the cache then?!), you can use `memcached` for _explicit_ in-memory caching with very explicit control (it's explicity your app's job to ensure any change to an entity refreshes or invalidates the cache appropriately, then!-).
Alex Martelli
@Alex Yeah, I thought you might say that :( Trouble here is that we'll now need to elaborately construct template data, rather than just passing the models in.
Sudhir Jonathan
@Sudhir, or just give the model class an accessor method that checks (and updates if needed) a memcached cache, what's so hard about that?
Alex Martelli
@Alex, certainly possible, but even thats unnecessary - why should printing 100 sons fathers result in 100 memcache calls when the handler already had all the father object ready and waiting?
Sudhir Jonathan
Don't forget to click the little checkmark to mark this answer as the one you wanted.
Travis Bradshaw
@Sudhir, why would each father need a separate `memcached` call?! Just cache the lot -- **if** you know it will never, **ever** need to change "under the covers" (the lack of such knowledge in the datastore is the full answer to your perhaps-rhetorical "why?"...;_).
Alex Martelli
@Travis, Sudhir's not the OP, so he can't "click the little checkmark" -- he **did** click it, on the highly related question he asked separately shortly after @sotangochips asked **this** one;-).
Alex Martelli
@Alex, That's true if we keep the fathers in a separate list...I'm trying to see how to get for son in sons: print son.father for a 100 sons to return the father without calling memcache... even if I overwrote the .father accessor to check memcache, wouldn't that result in a 100 calls? Or I could use a static map to keep all the fathers. I just had an idea to massively speed up templates like this... Is there some place I can share it with you other than this question's comments?
Sudhir Jonathan
@Sudhir, yes, whatever you call the accessor, whoever keeps the cache (you or Google's db code), there's **no** way calling `son.father` on 100 different values of `son` can avoid 100 accesses to the different sons' "father" attributes (even conceptually, absolutely zero way). Beyond this question's comments, SO makes it easy for you to open a **new** question about this!-) But I'm going to bed now so I won't see it until tomorrow morning anyway;-).
Alex Martelli
See http://stackoverflow.com/questions/2076470/speeding-up-templates-in-gae-py-by-aggregating-rpc-calls
Sudhir Jonathan
A: 

I'd rather loop through the sons and get parent's keys using son.parent_key().

parent_key()

Returns the Key of the parent entity of this instance, or None if this instance does not have a parent.

Since all the path is saved in the instance's key, theoretically, there is no need to hit the database again to get the parent's key.

After that, it's possible to get all parents' instances at once using db.get().

get(keys)

Gets the entity or entities for the given key or keys, of any Model.

Arguments:

keys A Key object or a list of Key objects.

If one Key is provided, the return value is an instance of the appropriate Model class, or None if no entity exists with the given Key. If a list of Keys is provided, the return value is a corresponding list of model instances, with None values when no entity exists for a corresponding Key.

jbochi