views:

89

answers:

2

Google App Engine supports a fetch operation based on a list of keys google.appengine.ext.db.get(keys).

I'd be interested to figure out if there is any guarantee that the result list preserves the order of the keys (i.e. keys = [k_1, k_2, k_3] then for the result [r_1, r_2, r_3] is always true that r_i.key() == k_i).

As far as I know, the API is performing the IN selects by internally issuing N sub-selects for each value in IN. I would expect this to happen for db.keys and so the call would preserve the keys order.

Anyways, I am not sure and I cannot find any reference that db.keys is equivalent to an IN select though and if there aren't any optimizations for its execution in place. Otherwise, the workaround would be quite simple (I would iterate and query myself for each key and so I'll have the guarantee that the I don't depend on db.keys implementation).

I have run some basic tests and the results are showing that:

  1. db.get() performs best
  2. db.get() preserves the keys order
  3. the alternative Model.get_by_id (for which the order of results will always be guaranteed) is performing slower

While the results seem to confirm my assumptions, I am wondering if others have investigated this and have reached similar or different conclusions.

tia, ./alex

Doing some more research I have found the following (documentation for both db.get() and Model.get():

If ids is a list, the method returns a list of model instances, with a None value when no entity exists for a corresponding Key.

Even if it doesn't underline it, I think it is clear that the order is guaranteed.

A: 

The quote from the documentation clarifies my question.

alexpopescu
+2  A: 

You're correct: db.get returns entities in the same order as the keys you provided. The performance difference you observe is because it only has to make one round-trip to the database instead of many, and because it can simultaneously fetch all the entities, rather than acting serially. It's not equivalent to 'SELECT ... IN ...', however, because it's based on Bigtable, and you're selecting on the primary key, so it can do lookups directly on the table.

One thing to bear in mind when doing performance comparisons: Always do these on the production server, never on dev_appserver. The two have totally different performance characteristics.

Nick Johnson
My simple perf test was performed on the production server.
alexpopescu