views:

1771

answers:

5

I figure one way to do a count is like this:

foo = db.GqlQuery("SELECT * FROM bar WHERE baz = 'baz')
my_count = foo.count()

What I don't like is my count will be limited to 1000 max and my query will probably be slow. Anyone out there with a workaround? I have one in mind, but it doesn't feel clean. If only GQL had a real COUNT Function...

+1  A: 

I haven't tried it, and this is an utter resource hog, but perhaps iterating with .fetch() and specifying the offset would work?

LIMIT=1000
def count(query):
   result = offset = 0
   gql_query = db.GqlQuery(query)
   while True:
     count = gql_query.fetch(LIMIT, offset)
     if count < LIMIT:
       return result
     result += count
     offset += LIMIT
orip
I've thought about that. Resource hog is right, but it would probably work.
barneytron
+9  A: 

You have to flip your thinking when working with a scalable datastore like GAE to do your calculations up front. In this case that means you need to keep counters for each baz and increment them whenever you add a new bar, instead of counting at the time of display.

class CategoryCounter(db.Model):
    category = db.StringProperty()
    count = db.IntegerProperty(default=0)

then when creating a Bar object, increment the counter

def createNewBar(category_name):
  bar = Bar(...,baz=category_name)

  counter = CategoryCounter.filter('category =',category_name).get()
  if not counter:
    counter = CategoryCounter(category=category_name)
  else:
    counter.count += 1
  bar.put()
  counter.put()

db.run_in_transaction(createNewBar,'asdf')

now you have an easy way to get the count for any specific category

CategoryCounter.filter('category =',category_name).get().count
Jehiah
barneytron
I confirmed the error I got doing the two .put() calls in a transaction: "Cannot operate on different entity groups in a transaction" I still like the idea of using two entities though.
barneytron
Shouldn't the default counter value be 1?
dave paola
+1  A: 

Count functions in all databases are slow (eg, O(n)) - the GAE datastore just makes that more obvious. As Jehiah suggests, you need to store the computed count in an entity and refer to that if you want scalability.

This isn't unique to App Engine - other databases just hide it better, up until the point where you're trying to count tens of thousands of records with each request, and your page render time starts to increase exponentially...

Nick Johnson
+5  A: 

+1 to Jehiah's response.

Official and blessed method on getting object counters on GAE is to build sharded counter. Despite heavily sounding name, this is pretty straightforward.

zgoda
Losing the ability to track counters on a per-user basis makes it a lot harder to weed out spammers though. How can you tackle this issue when using sharded counters?
Luke
A: 

The best workaround might seem a little counter-intuitive, but it works great in all my appengine apps. Rather than relying on the integer KEY and count() methods, you add an integer field of your own to the datatype. It might seem wasteful until you actually have more than 1000 records, and you suddenly discover that fetch() and limit() DO NOT WORK PAST THE 1000 RECORD BOUNDARY.

def MyObj(db.Model):
  num = db.IntegerProperty()

When you create a new object, you must manually retrieve the highest key:

max = MyObj.all().order('-num').get()
if max : max = max.num+1
else : max = 0
newObj = MyObj(num = max)
newObj.put()

This may seem like a waste of a query, but get() returns a single record off the top of the index. It is very fast.

Then, when you want to fetch past the 1000th object limit, you simply do:

MyObj.all().filter('num > ' , 2345).fetch(67)

I had already done this when I read Aral Balkan's scathing review: http://aralbalkan.com/1504 . It's frustrating, but when you get used to it and you realize how much faster this is than count() on a relational db, you won't mind...

Alyxandor