views:

22

answers:

0

This is a general question on doing pagination with models containing multiple entity indexes. This is easier with an example so let us consider the example that Brett Slatkin provided in his video (http://www.google.com/events/io/2009/sessions/BuildingScalableComplexApps.html).

You have your Message model (I have ignored the MessageIndex model) and Ive also added two extra properties (for publication and expiration times).

class Message(db.Model):
    sender = db.StringProperty()
    body = db.TextProperty()
    published = db.DateTimeProperty()
    expires = db.DateTimeProperty()

Now I am interested in doing filters on multiple fields, say messages published in a particular time window. eg

select * from Message where pubished > some_date and expires < some_other_date

Since GAE does not allow inequality filters on multiple fields, we have to break down the models into extra indexes. So now we have an index for the published and expires fields, giving us the following models (with a Message instance being the parent of MessagePublishedIndex and MessageExpiryIndex instances):

class Message(db.Model):
    sender = db.StringProperty()
    body = db.TextProperty()

class MessagePublishedIndex(db.Model):
    published = db.DateTimeProperty()

class MessageExpiryIndex(db.Model):
    expires = db.DateTimeProperty()

and the following key_only queries:

publish_keys = MessagePublishedIndex.all(key_only = True).filter("published >", some_date)
expire_keys = MessageExpiryIndex.all(key_only = True).filter("expires <", some_other_date)

msgs_by_pubdate = db.get([k.parent() for k in publish_keys])
msgs_by_expiry  = db.get([k.parent() for k in expire_keys])

Now Il have to do an intersection of these lists to find the common ones to get all messages within a particular time window.

This seems pretty wasteful. Is there an easier way to do this? Also this problem is exacerbated if the field in the index is a ListProperty because key_only queries cannot have "IN" filters. And even worse if I want pagination (ie "count" results from an "offset"), Il have to manually discard the first "offset" results and then do an intersection. Surely there has to be an easier (and smarter) way of doing this. Any ideas? It is bad enough GAE does not allow inequality filters on multiple fields (albeit for efficiency reasons), but to have to manually do all the zig-zags ourselves seems pretty inefficient (not to mention running over cpu and time limits).