views:

308

answers:

4

Original Design

Here's how I originally had my Models set up:

class UserData(db.Model):
    user = db.UserProperty()
    favorites = db.ListProperty(db.Key) # list of story keys
    # ...

class Story(db.Model):
    title = db.StringProperty()
    # ...

On every page that displayed a story I would query UserData for the current user:

user_data = UserData.all().filter('user =' users.get_current_user()).get()
story_is_favorited = (story in user_data.favorites)

New Design

After watching this talk: Google I/O 2009 - Scalable, Complex Apps on App Engine, I wondered if I could set things up more efficiently.

class FavoriteIndex(db.Model):
    favorited_by = db.StringListProperty()

The Story Model is the same, but I got rid of the UserData Model. Each instance of the new FavoriteIndex Model has a Story instance as a parent. And each FavoriteIndex stores a list of user id's in it's favorited_by property.

If I want to find all of the stories that have been favorited by a certain user:

index_keys = FavoriteIndex.all(keys_only=True).filter('favorited_by =', users.get_current_user().user_id())
story_keys = [k.parent() for k in index_keys]
stories = db.get(story_keys)

This approach avoids the serialization/deserialization that's otherwise associated with the ListProperty.

Efficiency vs Simplicity

I'm not sure how efficient the new design is, especially after a user decides to favorite 300 stories, but here's why I like it:

  1. A favorited story is associated with a user, not with her user data

  2. On a page where I display a story, it's pretty easy to ask the story if it's been favorited (without calling up a separate entity filled with user data).

    fav_index = FavoriteIndex.all().ancestor(story).get()
    fav_of_current_user = users.get_current_user().user_id() in fav_index.favorited_by
    
  3. It's also easy to get a list of all the users who have favorited a story (using the method in #2)

Is there an easier way?

Please help. How is this kind of thing normally done?

+1  A: 

I don't want to tackle your actual question, but here's a very small tip: you can replace this code:

if story in user_data.favorites:
    story_is_favorited = True
else:
    story_is_favorited = False

with this single line:

story_is_favorited = (story in user_data.favorites)

You don't even need to put the parentheses around the story in user_data.favorites if you don't want to; I just think that's more readable.

steveha
thank you for the advice :)
wings
You are very welcome. :-)
steveha
+2  A: 

What you've described is a good solution. You can optimise it further, however: For each favorite, create a 'UserFavorite' entity as a child entity of the relevant Story entry (or equivalently, as a child entity of a UserInfo entry), with the key name set to the user's unique ID. This way, you can determine if a user has favorited a story with a simple get:

UserFavorite.get_by_name(user_id, parent=a_story)

get operations are 3 to 5 times faster than queries, so this is a substantial improvement.

Nick Johnson
Thank you very much, this is exactly what I was looking for. Btw, it should be get_by_**key**_name
wings
2 quick questions: (1) i thought a key's name had to be unique, but i imagine your example works because it's only the key itself that has to be unique? (2) is my 'New Design' (scanning through the list of users in *all* favorite indexes to find a single user) really more efficient than just grabbing a set of keys, like i did in the 'Original Design'?
wings
um, also: should the UserFavorite model have any properties?
wings
1) Yes, it's the key that has to be unique, not just the key name, 2) I'm not sure I understand your question - I don't see any 'scanning' going on, just queries. 3) UserFavorite doesn't need any properties unless you want to associate some data (such as creation timestamp) with the favorite.
Nick Johnson
1) It says in the docs that a key name has to be unique, but I was just saying that it doesn't have to be unique if it has a different parent from other entities of the same kind. 2) I just want to know if the new approach I described would scale with thousands of users and stories. It seems like the datastore must be doing a lot more work to find a user's favorite stories with the new approach as opposed to the original way I had it set up.Thank you very much for all of your answers :)
wings
1) Where in the docs does it say that? It certainly shouldn't state outright that key names have to be unique. 2) In a nutshell, all Datastore queries have cost proportional to the number of returned results - they're all equally efficient. I'd still recommend my enhancement, though. :)
Nick Johnson
In the introduction to "The Model Class" -- docs/python/datastore/modelclass.html -- it says: "Every entity has a key, a unique identifier that represents the entity. An entity can have an optional key name, a string unique across entities of the given kind."
wings
+1  A: 

You can make the favorite index like a join on the two models

class FavoriteIndex(db.Model):
    user = db.UserProperty()
    story = db.ReferenceProperty()

or

class FavoriteIndex(db.Model):
    user = db.UserProperty()
    story = db.StringListProperty()

Then your query on by user returns one FavoriteIndex object for each story the user has favorited

You can also query by story to see how many users have Favorited it.

You don't want to be scanning through anything unless you know it is limited to a small size

gnibbler
+1  A: 

With your new Design you can lookup if a user has favorited a certain story with a query.
You don't need the UserFavorite class entities.
It is a keys_only query so not as fast as a get(key) but faster then a normal query.
The FavoriteIndex classes all have the same key_name='favs'.
You can filter based on __key__.

a_story = ......
a_user_id  = users.get_current_user().user_id()
favIndexKey = db.Key.from_path('Story', a_story.key.id_or_name(), 'FavoriteIndex', 'favs')
doesFavStory = FavoriteIndex.all(keys_only=True).filter('__key__ =', favIndexKey).filter('favorited_by =', a_user_id).get()

If you use multiple FavoriteIndex as childs of a Story you can use the ancestor filter

doesFavStory = FavoriteIndex.all(keys_only=True).ancestor(a_story).filter('favorited_by =', a_user_id).get()
marioddd