views:

183

answers:

3

In our system every user can write any other user a message. The first obvious idea is a data model like this:

User
 username
 email
 ... more properties

Message
 user_from_FK
 user_to_FK
 text
 creation-date
 ... more properties

So a message stores the User-Key as a FK in a tradition database. Like (for simplicity visualised as a table):

User-"Table"

KEY  username ...
-----------------
1    peter
2    paul

KEY

Message-"Table":

KEY   user_from_FK  user_to_FK  creation-date  text ...
-------------------------------------------------------
11    1             2           2342342342234  Hi Paul.
22    1             2           2342342356455  Hi Paul. You got my message?
33    2             1           2342342377544  Hi Peter. Yes, I did.

To query all messages for a given user is simple then:

SELECT __key__ FROM Message WHERE user_to_KF = :userKey ORDER BY creation-date

Our system should scale to millions of users and millions of messages. Maybe 500 messages will be send per second. Is this simple solution then a good data model? Can we do it better? (Every user is not allowed to have more then 1000 messages and this inbox. Messages should be sorted by date when returned. And we want to do paging.)

+4  A: 

That should work fine for storing/retrieving data. The puts and fetches won't block each other, if that's what you're worried about.

You might want to store more data for display in the Message model, though, since you can't JOIN data with the datastore. For example, you could store the name of the sender in the Message model.

For paging, you could benefit from adding to the structure slightly. Have a look at the following article for information on how you would do it: Paging through large datasets

Also have a look on how to do paging without changing the structure: Efficient paging using key instead of a dedicated unique property

Blixt
I disagree that paging will be "very inefficient". It will be somewhat less efficient to page on creation-date+key than it would be to page on a single unique property, taking two queries per page instead of 1. So changes might be nice, but I don't think they're actually needed.
Steve Jessop
Oh? Imagine a user has 1000 messages. Now fetch the last page of 10 messages using `offset=990`. You're getting 1000 entities from the datastore and throwing away the first 990! That will be "very inefficient".
Blixt
Sorry, I reread your comment now and see that you meant using the method described in the article. Then you're right that the performance difference won't be very large.
Blixt
I thought that it is worth linking to an article discussing the method mentioned by onebyone: http://google-appengine.googlegroups.com/web/efficient_paging_using_key_instead_of_a_dedicated_unique_property.txt
Blixt
+1. Good advice.
Nick Johnson
@Blixt: gosh, yes, I thought it went without saying not to page using offset, but of course you're absolutely right that every new GAE programmer should be explicitly told "never do that" :-)
Steve Jessop
A: 

Sorry. One information in my message was wrong: It's not 500 messages per second but per minute being written! But nice to hear that the performance is fine and the data model works. Great GAE :)

You can edit your original post with the updated information.
Nick Johnson
A: 

You should consider denormalization to guarantee scalability. Take a look at this article at hishscalability.com and the related articles.

It is the price to pay when you want to scale efficiently across a large number of machines :)

Guido