views:

46

answers:

3

I've been working with Google App Engine and I'm running into some slow performance with some of my data queries. I've read that designing an App Engine datastore is a different mindset from working with SQL databases and I'm not sure I'm doing this the best way. I have two questions to try to get on the right track:

Specifically:

I have a Foo type and a UserFoo type. Each UserFoo is an "instance" of a corresponding Foo and holds data specific to that instance. My Foo type has a fooCode property which is a unique identifier and I map each UserFoo with each Foo by using their fooCode properties. I then operate on each Foo with code like so:

foos = Foo.all().filter('bar =', bar)
for foo in foos:
    userFoo = UserFoo.all().filter('userKey =', user).filter('fooCode =', foo.fooCode)

Note: I'm using fooCode over a reference key so that we can easily delete and re-add new Foos and not have to then remap all the corresponding UserFoos.

In General:

What are typical approaches to designing GAE datastore tables and best-practices for using them?

+1  A: 

In General:

  • Denormalize as much as possible.

  • Refer to entities via their Key whenever possible; it is the fastest way to get data out of the datastore.

Specifically, you could likely get a dramatic increase in your performance if you used ReferenceProperty to establish relationships rather than codes that go into filters. I am going to guess that querying for a UserFoo's Foo happens a good deal more often than removing and remapping Foo's, yeah? In that case, the pragmatic and datastore-wise thing to do is to use Reference Properties.

Also, if the Foo-UserFoo relationship can be denormalized into a single entity, you remove the need for an entire series of queries altogether.

Adam Crossland
+1  A: 

This is the staircase of gets anti-pattern. The solution is ReferenceProperty pre-fetching.

The hitch is that you've decided not to use a ReferenceProperty. I would advise you to reconsider this choice.

Note: I'm using fooCode over a reference key so that we can easily delete and re-add new Foos and not have to then remap all the corresponding UserFoos.

Remember that an entity key is just an encoded representation of its path: the kind and name or ID of the entity and any of its ancestors. If you deleted and then re-created a Foo, it would only have a different key if it was given a different name or ID. If you have some way of giving the old and new entity the same fooCode, you could just as easily use the fooCode as the key name, which would allow a deleted and then re-added Foo to retain its original key.

Drew Sears
Reading these links makes a lot of sense. I'm reworking my datastore now. Thanks to all who pointed this out!
Nick Gotch
+1  A: 

I would suggest the following changes:

  1. Use a ReferenceProperty in UserFoo to refer to its Foo, or make it a child entity if appropriate. I don't understand your comment about remapping existing entities - that should never be necessary.
  2. Add a copy of the 'bar' property to each UserFoo
  3. Perform a single query, for UserFoo.all().filter('bar =', bar).order('userKey'). Results will be filtered by bar and grouped by user, and only require a single query, instead of one per user.
  4. Fetch the results at once by calling .fetch() on the query, rather than iterating over them. This is much more efficient.
  5. Use ReferenceProperty prefetching to retrieve the Foo object for each UserFoo, if you need it.
Nick Johnson