views:

68

answers:

2

Is there any clever way to avoid making a costly query with an IN clause in cases like the following one?

I'm using Google App Engine to build a Facebook application and at some point I (obviously) need to query the datastore to get all the entities that belong to any of the facebook friends of the given user.

Suppose I have a couple of entities modeled as such:

class Thing(db.Model):
    owner = db.ReferenceProperty(reference_class=User, required=True)
    owner_id = db.StringProperty(required=True)
    ...

and

class User(db.Model):
    id = db.StringProperty(required=True)
    ...

At some point I query Facebook to get the list of friends of a given user and I need to perform the following query

# get all Thing instances that belong to friends
query = Thing.all()
query.filter('owner_id IN', friend_ids)

If I did that, AppEngine would perform a subquery for each id in friend_ids, probably exceeding the maximum number of subqueries any query can spawn (30).

Is there any better way to do this (i.e. minimizing the number of queries)? I understand that there are no relations and joins using the datastore but, in particular, I would consider adding new fields to the User or Thing class if it helps in making things easier.

+3  A: 

I don't think there's an elegant solution, but you could try this:

On the User model, use Facebook ID as the key name, and store each user's list of things in a ListProperty.

class Thing(db.Model):
  ...

class User(db.Model):
  things = db.ListProperty(db.Key)
  ...

Entity creation would go like this:

user = User.get_or_insert(my_facebook_id)

thing = Thing()
thing.put()

user.things.append(thing.key())
user.put()

Retrieval takes 2 queries:

friends = User.get_by_key_name(friend_ids)
thing_keys = []

for friend in friends:
  thing_keys.extend(friend.things)

things = db.get(thing_keys)
Drew Sears
+1 Another option is to make Things children to User allows for ancestor queries for a specific type of thing to be returned. The use of key_names is critical to this really working.
kevpie
That's great, I even made Things children to User as suggested by kevpie. I had to deal with a couple more issues, though: a) I don't store a User entity for each friend_id, so I need to filter the None values I get when querying using get_by_key_name; b) I have to filter out things by some other fields as well, but I do that on the entities I fetch after I get them from the DataStore. Is there any better way to do that?
abahgat
Be sure to watch the talks Nick posted in his answer. You may want to use an index entity combined with a list property. This is shown in the first talk posted by Nick.
kevpie
I'm trying to develop the alternative using index entities. Performing get_by_key_name with a list of Facebook friends as an argument appears to be stressing the DataStore a bit too much. The problem with index entities associated to each user is when to update them to keep them consistent with Facebook.
abahgat
+2  A: 

This Google I/O talk by Brett Slatkin addresses the exact situation you're dealing with. See also his follow up talk this year.

Nick Johnson
I'm afraid you posted twice the same link to the last talk. Were you referring to this one? http://www.google.com/events/io/2009/sessions/BuildingScalableComplexApps.html
abahgat
Yes, I was. Sorry, fixed.
Nick Johnson