views:

59

answers:

1

Do you know which is the best approach for fetching chunks of result from a query?

1.Cursor

q = Person.all()
last_cursor = memcache.get('person_cursor')
if last_cursor:
    q.with_cursor(last_cursor)
people = q.fetch(100)
cursor = q.cursor()
memcache.set('person_cursor', cursor)

2.Offset

q = Person.all()
offset = memcache.get('offset')
if not offset:
   offset = 0
people = q.fetch(100, offset = offset)
memcache.set('offset', offset + 100)

Reading the Google documentation, it seems that Cursor does not add the overhead of a query offset.

+5  A: 

While it's hard to measure precise and reliably, I'd be astonished if the cursor didn't run rings around the offset approach at soon as a sufficiently large set of Person entities are getting returned. As the docs say very clearly and explicitly,

The datastore fetches offset + limit results to the application. The first offset results are not skipped by the datastore itself.

The fetch() method skips the first offset results, then returns the rest (limit results).

The query has performance characteristics that correspond linearly with the offset amount plus the limit.

I'm not sure how it could be any more explicit: O(offset + limit) is the big-O performance of fetching with an offset. If overall (say over multiple scheduled tasks) you're fetching a million items, 1000 at a time, when you fetch the last 1000 (with offset 999000) the datastore does not skip the first 999000 (even though fetch does not return them), so the performance impact will be staggering.

No such caveat applies to using cursors: fetching resumes exactly where it left off, without having to repeatedly fetch all the (possibly many) items already fetched along that cursor in previous queries. Therefore, with performance O(limit), elapsed time should be arbitrarily better than that you can obtain with an offset, as long as that offset gets sufficiently large.

Alex Martelli