views:

53

answers:

4

What is the best way to determine how many models of a certain kind are in my app's datastore? The documentation says that MyKind.all().count() is only marginally better than retrieving all of the data, and has a limit of 1000. This is not helpful, because I am expecting to have 6000+ instances of MyKind stored.

Is there a better way to do this? What if I just get the keys, and count those?

I'm using Python.

+1  A: 

Keep a counter object for your application stored in the database, and update it whenever you create and delete objects.

Chris B.
That's sort of a pain. It's not DRY, and now I have to go hunting all over the code to find any time that `MyKind` is created or deleted.
Rosarch
@Rosarch If `MyKind` instances are `put()` all over your code, maybe your design would benefit from more consistent application of DRY. :-)
Greg Bacon
+4  A: 

If an approximate count is good enough, you could use the statistics API:

http://code.google.com/appengine/docs/python/datastore/stats.html

Saxon Druce
+1  A: 

If you do keys-only it should be pretty fast, since this only has to read the index and doesn't actually fetch any entities. Use a cursor and loop until count() returns less than 1000.

Drew Sears
A: 

This SO question has an answer (by @jgeewax) that is almost right (wrong exit condition, as I commented there). Here is a fixed one...:

class MyModel(db.Expando):
    @classmethod
    def count_all(cls):
        """
        Count *all* of the rows (without maxing out at 1000)
        """
        count = 0
        query = cls.all().order('__key__')

        while True:
            current_count = query.count()
            if current_count == 0: return count
            count += current_count

            if current_count == 1000:
                last_key = query.fetch(1, 999)[0].key()
                query = query.filter('__key__ > ', last_key)

        return count

The performance problem, of course, is that this will use one actual query to the datastore for every 1000 items you have -- denormalizing things by keeping an actual count, as @Chris suggests, is going to use far fewer queries. (Be sure to use a sharded counter or other forms of efficient counters as App Engine Fan explains!).

Denormalization is a fact of life with non-relational DBs, and, done properly, can make a huge difference to your performance. As for the worries you express about DRY, just use class methods or other forms of functions to performs all puts and removes of your entities (i.e., [[except within the class methods in question]], never call methods such as .put() directly on the entities, call the appropriate class methods instead!), and those functions will be the obvious place to keep the denormalized counters up to date!

Alex Martelli