ansaurus

Question

Speeding up templates in GAE-Py by aggregating RPC calls

Answer 1

+1 A:

I have been in a similar situation. Instead of ReferenceProperty, I had parent/child relationships but the basics are the same. My current solution is not polished but at least it is efficient enough for reports and things with 200-1,000 entities, each with several subsequent child entities that require fetching.

You can manually search for data in batches and set it if you want.

# Given the posts, fetches all the data the template will need
# with just 2 key-only loads from the datastore.
posts = get_the_posts()

author_keys = [Post.author.get_value_for_datastore(x) for x in posts]
authors = db.get(author_keys)

city_keys = [Author.city.get_value_for_datastore(x) for x in authors]
cities = db.get(city_keys)

for post, author, city in zip(posts, authors, cities):
  post.author = author
  author.city = city

Now when you render the template, no additional queries or fetches will be done. It's rough around the edges but I could not live without this pattern I just described.

Also you might consider validating that none of your entities are None because db.get() will return None if the key is bad. That is getting into just basic data validation though. Similarly, you need to retry db.get() if there is a timeout, etc.

(Finally, I don't think memcache will work as a primary solution. Maybe as a secondary layer to speed up datastore calls, but you need to work well if memcache is empty. Also, Memcache has several quotas itself such as memcache calls and total data transferred. Overusing memcache is a great way to kill your app dead.)

jhs 2010-01-16 09:29:52

The second code block is very helpful... I didn't think of zip and using it that way.. but the first block is actually a built in feature of the sdk... references are already cached after they're resolved once, so there's absolutely no need for that code..

Sudhir Jonathan 2010-01-17 06:59:13

You are correct. My code was copied from a similar situation not using ReferenceProperty. It would be nice to populate the property via some back-door rather than just blowing away the `.city` attribute. But I believe that would work in a pinch.

jhs 2010-01-17 09:00:21

Hmm.. I am reading the code in google/appengine/ext/db/__init__.py in the SDK. It looks like a simple assignment works fine because it will call the ReferenceProperty's __set__() method. I will update the answer to be shorter and clearer.

jhs 2010-01-17 09:05:27

Done :) Also I forgot to mention, I use itertools.izip in my real code because I used to hit MemoryErrors from time to time. It's probably not necessary in general though.

jhs 2010-01-17 09:09:01

Is the itertools version any faster? Why would there even be a difference? Zip seems like a very simple algo to me.

Sudhir Jonathan 2010-01-26 06:02:49

I agree! With App Engine, by far the most expensive thing (dollars and time) is RPC calls, especially the datastore. There is plenty of speed and memory back in the app server VM. However, I have switched to itertools for two reasons: first, Python 3 uses iterators much more so I want less headache if/when I port the code forward. Secondly, it is true you can get MemoryError pretty easily. Maybe you will save yourself from a bug one day.

jhs 2010-01-26 07:50:08

Answer 2

A:

Here's some great examples of pre-fetching...

http://blog.notdot.net/2010/01/ReferenceProperty-prefetching-in-App-Engine

Brandon Fields 2010-04-20 21:18:08

ansaurus

tags:

views:

answers:

Speeding up templates in GAE-Py by aggregating RPC calls

related questions