views:

31

answers:

1

I am trying to use AppEngine-MapReduce. I understand how to perform an operation over all entities of some entity_kind, but what is the easiest way to only operate on entities over a data range when the entity has a date attribute? Is there a simple way to pass parameters to the mapper?

For example, what if I only wanted to delete entities where:

entity.created >= start and entity.created < stop

class Entity(db.Model):
  created = db.DateTimeProperty()

from mapreduce import operation as op
def process(entity):
  yield op.db.Delete(entity)
+2  A: 

Currently there's no way to iterate over a query in a mapreduce - you have to iterate over every entity of the given kind. Instead, you should apply the filter in the map function, and only delete entities that match.

Nick Johnson
What if I created a simple model called DateRange where each entity had DateRange.start and DateRange.stop? Then could I perform map reduce across all DateRange entities and fetch my other models from entity.start to entity.stop. For example: def process(entity): someEntities = SomeModel.all().filter('date >= ', entity.start).filter('date < ', entity.stop) It seems like this would be the easiest way to look at specific date ranges using the current implementation. Thanks in advance Nick.
Chris
Sure, but then you're replicating most of mapreduce's sharding logic to generate your date ranges. If you just want to delete records matching a certain criteria, though, you may want to look into using cursors in conjunction with key_only queries and the task queue.
Nick Johnson
I agree. Deleting records was just a simple example to see if I could perform operations over specific date ranges rather than over all entities.
Chris