views:

382

answers:

6

I'm building an app on GAE that needs to report on events occurring. An event has a type and I also need to report by event type.

For example, say there is an event A, B and C. They occur periodically at random. User logs in and creates a set of entities to which those events can be attributed. When the user comes back to check the status, I need to be able to tell how many events of A, B and/or C occurred during a specific time range, say a day or a month.

The 1000 limit is throwing a wrench into how I would normally do it. I don't need to retrieve all of the entities and present them to the user, but I do need to show the total count for a specific date range. Any suggestions?

I'm a bit of python/GAE noob...

A: 

My approach would be to have an aggregate model or models to keep track of event types, dates and counts. I'm not 100% how you should model this given your requirements, though.

Then, I'd fire off deferred tasks to asynchronously update the appropriate aggregate models whenever a user does something that triggers an event.

Nick Johnson's Background work with the deferred library article has much more information, and provides a framework that you might find useful for doing the kind of aggregation you're talking about.

Will McCutchen
Thank you! I'll definitely check it out.Main issue is that aggregate counts do not work well with reporting needs, especially when filtering data.As previous answer mentioned, a relational db would be ideal, but I'm thinking there has to be a way to accomplish something similar.
Linsidious
I also forgot to add a bit about averages.One way I was thinking this could be accomplished is to use a loop and "page through" the data. Each time it returns, do a sum of all values and increment counter. Then once it returned the last value, take the total sum and divide by the counter.However, I do not know if this will work with the .filter properties well or the GQL. Any ideas?
Linsidious
+2  A: 

App Engine is not a relational database and you won't be able to quickly do counts on the fly like this. The best approach is to update the counts at write time, not generate them at read time.

When generating counts, there are two general approaches that scale well with App Engine to minimize write contention:

  1. Store the count in Memcache or local memory and periodically flush. This is the simplest solution, but it can be volatile and data loss is probable.
  2. Use a Sharded Counter. This approach is a bit more reliable but more complex. You won't be able to sort easily by count, but you could also periodically flush to another indexed count field periodically and sort by that.
Ikai Lan
That really won't work in this scenario - data is too granular to be able to effectively shard. I also need to keep historic records for a while. But thank you your time!
Linsidious
Read Ikai's link - sharded counters don't rely on sharding the data, but rather on sharding the counter itself.
Nick Johnson
A: 

This sounds very similar to a question that I posed on StackOverflow.

http://stackoverflow.com/questions/1183102/how-to-get-the-distinct-value-of-one-of-my-models-in-google-app-engine I needed to know how to get a distinct values for an entities within my models and there is going to be over 1000 entities for that model.

Hope that helps.

AutomatedTester
It is a similar issue, except the end result is not outputted as distinct entities, but as a sum or avg. Thanks!
Linsidious
A: 

Would a solution using cursors (like the one below) work for you? I personally use this method to count the number of entries in a scenario similar to yours, and haven't seen yet any problems with it (although I run on a schedule, since constant querying of the data store is pretty taxing on the CPU quota).

def count(query): 
  i = 0 
  while True: 
    result = query.fetch(1000) 
    i = i + len(result) 
    if len(result) < 1000: 
      break 
    cursor = query.cursor() 
    query.with_cursor(cursor) 
  return i 
David Parunakian