views:

31

answers:

1

When running mapreduce on Google App Engine, I occasionally come across a key reference that is invalid. I'd like to make an efficient method that I could run across any model using mapreduce to verify that all the key references for each entity are still valid. What would be the most efficient way to do this? Here is my map function idea so far.

#map function
def check_all_references(entity):
  for attr, value in entity.__dict__.iteritems():
    if type(value)== #google.appengine.api.datastore_types.Key:
      #Check to see if the referenced entity exists
      ....
      if referencedEntityExists: return
      else:
        logging.error('Entity %s referenced entity %s which is not valid.', entity, referencedEntity)
A: 

What you need to do is accumulate all the keys, and do a batch get on them to check which ones still exist:

def check_all_references(entity):
  # Construct a dict mapping reference property names to keys
  refs = dict((name, prop.get_value_for_datastore(entity))
              for name, prop in entity.properties().items()
              if isinstance(prop, db.ReferenceProperty))
  # Fetch all the referenced entities
  entities = db.get(refs.values())
  for (name, key), entity in zip(refs, entities):
    if not entity:
      logging.error("Entity %s property %s references entity %s which does not exist",
                    entity.key(), name, key)
Nick Johnson
Should the 'refs' and 'refkeys' variables all be the same? I get a global name 'refkeys' is not defined with this code.
Chris
You're right, they should be. Refactoring fail.
Nick Johnson