views:

269

answers:

3

One of the nice things relational databases support are the aggregate functions like count, sum, avg etc. But it seems that if you are using GAE, when inserting or updating a record you must calculate and store the count, sum, avg, etc. values of the whole table. But what if you have many conditional groupings? Given a Person:

class Person {
    @Id
    Integer age;
    String city;
}

If I want

  1. the total number of persons and
  2. the average age

Is it correct that everytime I create, update or delete a person I should also calculate both aggregates and store them as separate columns in the same table. If I also want the total and average values per city, should I store these values for each city also as separate columns in the same table?

+2  A: 

Right: to use GAE storage properly, you need to selectively de-normalize some aspects of your model, keeping "redundant" data which, in a DB in normal form, you'd recompute on the fly, such as aggregates (overall and "grouped by" ones).

However, don't add such fields to the Person table in your case -- that makes little sense! Make another PersonAggregates table with columns such as City (null/missing for overall totals), Count, TotalAges (easier to maintain: compute the average at any time as total divided by count).

Alex Martelli
A: 

For frequently used aggregates the best is to update them on every update/insert/delete.

If you haven't designed such aggregates into your application from the start, you can run a script via Remote DataStore API or set up a server-side cron job that will process all entities and calculate the aggregates. It is fairly easy, just keep in mind per-request CPU quota.

Constantin
A: 

not be able to do that is another reason to think twice about using GAE, I didn't expect that from Google.

The other reason is that GAE does not support naked domains

wiibart