views:

853

answers:

4

Example Problem:

Entities:

  • User contains name and a list of friends (User references)
  • Blog Post contains title, content, date and Writer (User)

Requirement:

I want a page that displays the title and a link to the blog of the last 10 posts by a user's friend. I would also like the ability to keep paging back through older entries.

SQL Solution:

So in sql land it would be something like:

select * from blog_post where user_id in (select friend_id from user_friend where user_id = :userId) order by date

GAE solutions i can think of are:

  • Load user, loop through the list of friends and load their latest blog posts. Finally merge all the blog posts to find the latest 10 blog entries
  • In a blog post have a list of all users that have the writer as a friend. This would mean a simple read but would result in quota overload when adding a friend who has lots of blog posts.

I don't believe either of these solutions will scale.

Im sure others have hit this problem but I've searched, watched google io videos, read other's code ... What am i missing?

+11  A: 

If you look at how the SQL solution you provided will be executed, it will go basically like this:

  1. Fetch a list of friends for the current user
  2. For each user in the list, start an index scan over recent posts
  3. Merge-join all the scans from step 2, stopping when you've retrieved enough entries

You can carry out exactly the same procedure yourself in App Engine, by using the Query instances as iterators and doing a merge join over them.

You're right that this will not scale well to large numbers of friends, but it suffers from exactly the same issues the SQL implementation has, it just doesn't disguise them as well: Fetching the latest 20 (for example) entries costs roughly O(n log n) work, where n is the number of friends.

Nick Johnson
So if i need this feature, i should loop and sort myself, take whatever cpu hit is required and memcache the results. This would be considered "Best Practice"?
Sam
Pretty much, yes. If you want 20 or fewer results, I would suggest just fetching the first 20 from each user, sorting, and taking the first 20 of the result. If you want more, implement a proper merge sort, using the queries as iterators.
Nick Johnson
+1  A: 

"Load user, loop through the list of friends and load their latest blog posts."

That's all a join is -- nested loops. Some kinds of joins are loops with lookups. Most lookups are just loops; some are hashes.

"Finally merge all the blog posts to find the latest 10 blog entries"

That's a ORDER BY with a LIMIT. That's what the database is doing for you.

I'm not sure what's not scalable about this; it's what a database does anyway.

S.Lott
Yes, but the database KNOWS it's doing that. Therefore, it can optimise it.
Lee B
+4  A: 

This topic is covered in a Google io talk: http://code.google.com/events/io/sessions/BuildingScalableComplexApps.html

Basically the Google team suggest using list properties and what they call relational index entities, an example application can be found here: http://pubsub-test.appspot.com/

Sam
A: 

Here is an example in python gleamed from http://pubsub-test.appspot.com/:

Anyone have one for java? Thanks.

from google.appengine.ext import webapp

from google.appengine.ext import db

class Message(db.Model): body = db.TextProperty(required=True) sender = db.StringProperty(required=True) receiver_id = db.ListProperty(int)

class SlimMessage(db.Model): body = db.TextProperty(required=True) sender = db.StringProperty(required=True)

class MessageIndex(db.Model):
receiver_id = db.ListProperty(int)

class MainHandler(webapp.RequestHandler):

def get(self):

receiver_id = int(self.request.get('receiver_id', '1'))

key_only = self.request.get('key_only').lower() == 'on'

if receiver_id:

  if key_only:

      keys = db.GqlQuery(

          'SELECT __key__ FROM MessageIndex WHERE receiver_id = :1',

          receiver_id).fetch(10)

      messages.extend(db.get([k.parent() for k in keys]))

  else:

      messages.extend(Message.gql('WHERE receiver_id = :1',

                      receiver_id).fetch(10))
devadvocate