views:

15

answers:

1

My reading is limited as of yet, but so far here are some key points I have identified for using the GAE Datastore:

  • It is not a relational database.
  • Data duplication occurs by default across storage space.
  • You cannot 'join' tables at the datastore level.
  • Optimized for reads with less frequent writes.

These lead me to the following data model for a Blog System:

Blogs have a relatively known set of 'columns': id, date, author, content, rating, tags. The Datastore allows for additional columns as desired, but it is known that the likelihood of adding additional columns on the fly will be rare as it requires more backend specialized coding as well as more thought to the entire blog system.

What Blogs do not have are a set number of Comments and Tags. In a traditional relational db structure, these are mapped through a join. Since these are not possible in GAE, I have thought about implementing the following:

  • Articles -> ID, Author, Date, Title, Content, Rating, Tags
  • Comments -> Article_ID, Author, Date, Content, Rating
  • Tags -> Tag, Article IDs

Example:

Article- 1 - Administrator - 01/01/2011 - Questions? - Answers… - 5 - questions, answers, speculations, ruminations 2 - Administrator - 01/05/2011 - Who knows? - Not me! - 10 - questions

Comments- 1 - John Smith - 01/02/2011 - Stupid, stupid, stupid.. - 0 1 - Jane Doe - 01/03/2011 - Smart, smart, smart.. - 5

Tags- questions - 1, 2 answers - 1 speculations - 1 ruminations - 1

Now, this is my reasoning. When browsing a Blog you do so by: Date, Author, Tag / Topic, Rating, Comments, etc. Date, Author, and Rating are static and so can easily reside in a single table along with the article in question.

Tags are duplicated between the tags 'table' and the article 'table', but consistency here is handled at the application level and the tags are left in to eliminate a join at the application level when sending articles to the viewer. The Tags table is used in order to search by tag. The list of articles is then parsed at application level and then it retrieves these articles through an application call.

The same thing is going to happen with the comments. The join will occur at the application level through an extra method call passing a retrieved article ID.

Now, why do I want to process a join at the application level? I had thought about inserting everything into each article, adding comments as they were created, but got to thinking about the time complexity of sorting and searching once a blog was into the thousands of articles, as well as the limitations on size of returns, not knowing how large articles / comments might become. I haven't tested, but in thinking of the time complexity, I began to conclude that article retrieval would grow linearly to number of articles when attempting to search these articles by tags. Am I correct in this and is this approach a way to overcome that? Also, in general does this data model look like a way to validly implement persistent data storage in GAE?

Thanks, Trying to wrap my head around it...

+1  A: 

Your approach sounds pretty reasonable. Retrieving articles by tag is most easily achieved by having a ListProperty of tags on the article, and filtering on that - which will take time proportional to the number of results returned, not to the number in the datastore - and you're right that you should keep a separate set of 'tag' entities so that you can list all the tags in use separately.

You may want to check out my series of posts on writing a blogging system on App Engine.

Nick Johnson