views:

1674

answers:

3

I'm wondering what would be the best way to design a social application where members make activities and follow other member's activities using Google AppEngine.

To be more specific lets assume we have these entities:

  • Users who have friends
  • Activities which represent actions made by users (lets say each has a string message and a ReferenceProperty to its owner user, or it can use parent association via appengine's key)

The hard part is following your friend's activities, which means aggregating the latest activities from all your friends. Normally, that would be a join between the Activities table and your friends list but thats not a viable design on appengine as there are no join simulating it will require firing up N queries (where N is number of friends) and then merging in memory - very expensive and will probably exceed request deadline...)

I'm currently thinking of implementing this using inbox queues where creation of a new Activity will fire a background process that will put the new activity's key in the "inbox" of every following user:

  • Getting "All the users who follow X" is a possible appengine query
  • Not a very expensive batch input into a new "Inbox" entity that basically stores (User, Activity Key) tuples.

I'll be happy to heard thought on this design or alternative suggestions etc.

+11  A: 

Take a look at Building Scalable, Complex Apps on App Engine (pdf), a fascinating talk given at Google I/O by Brett Slatkin. He addresses the problem of building a scalable messaging service like Twitter.

Here's his solution using a list property:

class Message(db.Model):
    sender = db.StringProperty()
    body = db.TextProperty()

class MessageIndex(db.Model):
    #parent = a message
    receivers = db.StringListProperty()

indexes = MessageIndex.all(keys_only = True).filter('receivers = ', user_id)
keys = [k.parent() for k in indexes)
messages = db.get(keys)

This key only query finds the message indices with a receiver equal to the one you specified without deserializing and serializing the list of receivers. Then you use these indices to only grab the messages that you want.

Here's the wrong way to do it:

class Message(db.Model):
    sender = db.StringProperty()
    receivers = db.StringListProperty()
    body = db.TextProperty()

messages = Message.all().filter('receivers =', user_id)

This is inefficient because queries have to unpackage all of the results returned by your query. So if you returned 100 messages with 1,000 users in each receivers list you'd have to deserialize 100,000 (100 x 1000) list property values. Way too expensive in datastore latency and cpu.

I was pretty confused by all of this at first, so I wrote up a short tutorial about using the list property. Enjoy :)

wings
Exactly my initial design. But what I understood from that talk and from the AppEngine documentation is that lists are pretty useless when in comes to IN queries.The query you mentioned will fire several queries in google system, each filtering by one of the values in the list properties and then merge the result.Google caps this kind of query to 30 simultaneous queries which means it can only be used for list who will contain relatively small number of Keys (<30). When it comes to friends, this list could contain tens if not hundreds (or thousands?) of keys for people you're following.
Eran Kampf
btw I asked you that same question regarding lists in another StackOverflow question you posted :)
Eran Kampf
I don't think that's right. Brett says that you're limited to 5000 indexed properties per entity when he's talking about list property performance (see 14:15 in the video). I think you should be able to have thousands of users in a receivers StringListProperty, while still being able to perform an efficient query. I'm not sure what the line "A single query containing != or IN operators is limited to 30 sub-queries" means, but I positive it doesn't affect what you want to do here.
wings
What is the difference between GQL IN operator (which according to docs limited to 30 items) and the filter used here (which according to the presentation will work for large lists) ?
Eran Kampf
I don't know, sorry. Nick Johnson would probably know; if he doesn't show up here try asking this in the app engine group http://groups.google.com/group/google-appengineOr you could just try it out. I mean, if it actually works and has decent performance that's all that matters.
wings
Just went over the presentation againand had my "A-Ha!" moment! thanks... this is exactly the solution I need. and instead of saving a Friends list in User entity I should save a "followed by" (recipients)...
Eran Kampf
'IN' is "find any entity containing one of these values", which gets split out into n "find any entity containing this exact value" queries. An equality filter on a list, however, is the latter - "find any entity that has this exact value (possibly in a list)", and so isn't inefficient in the same way an IN query is.
Nick Johnson
By the way, good summary of the state-of-the-art, wings. :)
Nick Johnson
+2  A: 

I was looking at the same problem and found this excellent(!) presentation from the AppEngine, which they gave at Google I/O:

http://www.scribd.com/doc/16952419/Building-scalable-complex-apps-on-App-Engine

I hope you'll find it useful too.

Gregor Hochmuth
+4  A: 

I don't know whether it is the best design for a social application, but jaiku was ported to App Engine by it's original creator when the company was acquired by Google, so it should be reasonable.

See the section Actors and Tigers and Bears, Oh My! in design_funument.txt. The entities are defined in common/models.py and the queries are in common/api.py.

Thanks a lot! that code is a great reference...
Eran Kampf