views:

49

answers:

2

I've read in multiple locations that GAE lifted the 1000 record limit on queries and counts, however, I can only seem to get a count of the records up to 1000. I won't be pulling more than 1000 queries at a time, but the requirements are such that I need a count of the matching records.

I understand you can use cursors to "paginate" through the dataset, but to cycle through just to get a count seems a bit much. Presumably when they said they "lifted" the limit, it was the hard limit - you still need to cycle through the results 1000 at a time, am I correct?

Should I be using a method other than the .all()/filter method to generate 1000+ counts?

Thanks in advance for all your help!

+2  A: 

The behavior of Query.count() is inconsistent with the documentation when no limit is explicitly specified - the documentation indicates that it will count "until it finishes counting or times out." GAE Issue 3671 reported this bug (about 3 weeks ago).

The workaround: explicitly specify a limit and then that value will be used (rather than the default of 1,000).

Testing on http://shell.appspot.com demonstrates this:

# insert 1500 TestModel entites ...
# ...
>>> TestModel.all(keys_only=True).count()
1000L
>>> TestModel.all(keys_only=True).count(10000)
1500L

I also see the same behavior on the latest version of the development server (1.3.7) using this simple test app:

from google.appengine.ext import webapp, db
from google.appengine.ext.webapp.util import run_wsgi_app

class Blah(db.Model): pass

class MainPage(webapp.RequestHandler):
    def get(self):
        for i in xrange(3):
            db.put([Blah() for i in xrange(500)])  # can only put 500 at a time ...
        c = Blah.all().count()
        c10k = Blah.all().count(10000)
        self.response.out.write('%d %d' % (c,c10k))
        # prints "1000 1500" on its first run

application = webapp.WSGIApplication([('/', MainPage)])

def main(): run_wsgi_app(application)
if __name__ == '__main__': main()
David Underhill
I'll try your solution and see how far I get. The notion that you have to supply a limit to the count is patently absurd, but hopefully it will get resolved soon. Thank you kindly!
etc
It's not absurd - counting costs O(n) time, and presumably there's an upper limit on how much time you are willing to spend counting?
Nick Johnson
@David that's weird!? (p.s. Your second example can't work since put in batch is limited to 500)
systempuntoout
@systempuntoout Good point. Unfortunately, the development server does allow batch puts >500 entities (unlike the production server). I've tweaked the code so it would "work" on the production server too.
David Underhill
A: 

According to this App Engine blog post, the 1000-entity limit has only just been removed for count (and offset) in version 1.3.6. The limit had already been removed for fetch as of version 1.3.1. Upgrade to the latest version and the limit should be removed.

You do not need to cycle through results 1000 at a time (though you could, and it might even be more efficient); simply pass in the maximum number of results you'd like back:

    for m in MyModel.all().fetch(82000):
        # ...

In versions before 1.3.1, the number passed in had to be less than or equal to 1000.

Cameron
Ideally upgrading to the latest version would be the solution. Unfortunately, there is a bug in the latest version which makes the documentation inconsistent with the behavior - count() will return only 1,000 results unless you explicitly supply a limit greater than 1,000.
David Underhill
As Mr. Underhill stated, for whatever reason, bug or otherwise, a plain count on a query only produces 1000 even with the latest version.
etc