views:

170

answers:

1

I am going to start designing a web app shortly, and while I have lots of experience doing it in the SQL world, I have no idea what I need to take into consideration for doing so with the goal of migrating to GAE in the very near future.

Alternatively, I could design the app for GAE from the start, and so in that case, what are the differences I need to take into consideration? In other words, what are the DOs and DONTs of writing your app for GAE, coming from a relational databases past.

+6  A: 

Just out of top of my head:

  • It's really ONLY a key->value store, don't be fooled by things like GQL (which is just a subset of SQL SELECT)
  • No JOINs - often you have to denormalize or forget
  • More or less frequent timeouts
  • (Very) slow access comparing to local SQL base.
  • COUNT very expensive
  • OFFSET (in SELECT) implemented on client side - so in fact you fetch all records up to offset - as pointed by Nick Johnson in one of the comments, it's not client side, so now, as LIMIT of 1000 is gone it's similar to SQL databases.
  • (Recently removed) LIMIT of 1000 fetched rows
  • SELECT performance decreases drastically with increasing number of returned rows
  • Migrations are hard to do because you have to do them using normal http requests and each request is killed after 30 secs. You have to resort to task queues that process rows in batches
  • There are pseudo foreign keys - called ReferenceProperties in Python API - but they are not enforced in any way - if someone/something delete target object then you have what is know as dangling pointer in C++
  • Fields that you use for queries have to be indexed, but still using key (sort of primary key for each row/instance) makes your queries run faster
  • Building indexes on live instance can take a lot of time (and you can't decrease it) and without them your app often can't work. Beer and patience highly recommended..
  • A LOT of artificial limits (like the already removed max LIMIT of 1000). E.g. GQL 'IN' operator is just syntactic sugar for multiple OR-s, and there is upper limit of 30 values used.

All that means that you probably can't avoid exposing inconsistent state to user, and almost for sure you cannot avoid having inconsistent state of your data (e.g. half rows migrated and half not, during manual JOIN data changes etc)

Tomasz Zielinski
Very, very good summary of the pitfalls that *will* be encountered.
Will McCutchen
The App Engine datastore isn't 'only' a key->value store - it implements indexing. It's also not true that offset is implemented client-side - it's handled by the datastore, but the datastore has to read and skip the appropriate number of index rows (as does every other SQL DB, really). The implementation of IN isn't an 'artificial limit', either.
Nick Johnson
Ok, so it's key->value store with indexing (without which it would be unusable). Re offset - my bad, until recently it was problematic because of `1000 limit`, corrected. Re IN - for me 30 looks like artificial limitation (of course I realize it's because of some technical trade-off).
Tomasz Zielinski