views:

120

answers:

2

I'm writing a simple forum-like application on Google App Engine and trying to avoid scalability issues. I'm new to this non-RBDMS approach, i'd like to avoid pitfalls from the beginning.
The forum design is pretty simple, posts and replies will be the only concepts. What will be the best approach to the problem if the forum have millions of posts?

The model so far (stripped from useless properties):

class Message(db.Model):  
    user = db.StringProperty() # will be a google account user_id  
    text = db.TextProperty() # the text of the message  
    reply_to = db.SelfReferenceProperty() # if null is a post, if not null a reply (useful for reply-to-reply)  

Splitting the model, i think it's faster because it will query less items when retrieving "all posts":

class Post(db.Model):  
    user = db.StringProperty() # will be a google account user_id  
    text = db.TextProperty() # the text of the message  

class Reply(db.Model):  
    user = db.StringProperty() # will be a google account user_id  
    text = db.TextProperty() # the text of the message  
    reply_to = db.ReferenceProperty(Post)  

This is a many-to-one relation in a RDBMS world, should a ListProperty be used instead? If so, how?

Edit:

Jaiku uses something like this

class StreamEntry(DeletedMarkerModel):  
...  
    entry = models.StringProperty()     # ref - the parent of this, should it be a comment  
...
A: 

Firstly, why don't you use user = db.UserProperty() instead of user = db.StringProperty()?

Secondly, I'm quite sure you should use whatever it works and is more readable and test the performance later, for three reasons:

  1. KISS (Keep it simple)
  2. Early optimizations are bad
  3. You can't improve what you can't measure

So when you are ready to measure, then start the optimizations.

I'm not saying this because I don't know nothing about RDBMS, No-SQL DBMS or Google Datastore performance optimizations, but because I usually get all my knowledge about it from testing, which seems to contradict previous assumptions more usually than I expected.

Jader Dias
I know that premature optimizations are generally bad, but i'm trying to wrap my mind on this new architecture, learning the best practices etc... Also, the model design will leak in everything, from views to css, i better get it right from the start :)I'm using the string property because db.UserProperty isn't guaranteed to be static, if a user change it's email address the user entity will be different, while the user_id is guaranteed to be permanent.
Spear
In my applications the possibility of a user changing its e-mail address is not taken in consideration. About the design my best bet is to use the first model you proposed.
Jader Dias
A: 

You might want to take a look at a good tutorial on creating a php forum from scratch. Sure that one is about PHP but it also covers the general overview of forum design.

Basically, don't split posts and replies or threads and posts. It will lead to some really awkward queries later on. A thread is simply a post that isn't replying to anything.

cletus