views:

3725

answers:

4

Since the Google App Engine Datastore is based on Bigtable and we know that's not a relational database, how do you design a database schema/data model for applications that use this type of database system?

A: 

As GAE builds on how data is managed in Django there is a lot of info on how to address similar questions in the Django documentation (for example see here, scroll down to 'Your first model').

In short you design you db model as a regular object model and let GAE sort out all of the object-relational mappings.

Mingus Rude
+11  A: 

Designing a bigtable schema is an open process, and basically requires you to think about:

  • The access patterns you will be using and how often each will be used
  • The relationships between your types
  • What indices you are going to need
  • The write patterns you will be using (in order to effectively spread load)

GAE's datastore automatically denormalizes your data. That is, each index contains a (mostly) complete copy of the data, and thus every index adds significantly to time taken to perform a write, and the storage space used.

If this were not the case, designing a Datastore schema would be a lot more work: You would have to think carefully about the primary key for each type, and consider the effect of your decision on the locality of data. For example, when rendering a blog post you would probably need to display the comments to go along with it, so each comment's key would probably begin with the associated post's key.

With Datastore, this is not such a big deal: The query you use will look something like "Select * FROM Comment WHERE post_id = N." (If you want to page the comments, you would also have a limit clause, and a possible suffix of " AND comment_id > last_comment_id".) Once you add such a query, Datastore will build the index for you, and your reads will be magically fast.

Something to keep in mind is that each additional index creates some additional cost: it is best if you can use as few access patterns as possible, since it will reduce the number of indices GAE will construct, and thus the total storage required by your data.

Reading over this answer, I find it a little vague. Maybe a hands-on design question would help to scope this down? :-)

0124816
The GAE datastore doesn't "automatically denormalise your data". It automatically _indexes_ most columns, but that's totally different.
Nick Johnson
The datastore also doesn't automatically create indexes for you - the dev appserver does. If you fail to exercise a particular query in the dev appserver, then it won't add the appropriate entry to indexes.yaml, and the query will fail when deployed.
Nick Johnson
I've read somewhere that each index contains a copy of _all_ fields to be returned; however the docs now state that only those columns that affect the index are stored within it. AFAICT there is nowhere that states if an index read is a simple scan, or scan + lookup. Have you seen anything on that?
0124816
+1  A: 

See this question, which asks more or less the same thing.

Nick Johnson
+1  A: 

You can use www.web2py.com. You build the model and the application once and it works on GAE but also witl SQLite, MySQL, Posgres, Oracle, MSSQL, FireBird

massimo