views:

327

answers:

1

Database migrations are a popular pattern, particularly with Ruby on Rails. Since migrations specify how to mold old data to fit a new schema, they can be helpful when you have production data that must be converted quickly and reliably.

But migrating models in App Engine is difficult since processing all entities sequentially is difficult, and there is no offline operation to migrate everything effectively in one big transaction.

What are your techniques for modifying a db.Model "schema" and migrating the data to fit the new schema?

+8  A: 

Here is what I do.

I have a MigratingModel class, which all of my models inherit from. Here is migrating_model.py:

"""Models which know how to migrate themselves"""

import logging
from google.appengine.ext import db
from google.appengine.api import memcache

class MigrationError(Exception):
  """Error migrating"""

class MigratingModel(db.Model):
  """A model which knows how to migrate itself.

  Subclasses must define a class-level migration_version integer attribute.
  """

  current_migration_version = db.IntegerProperty(required=True, default=0)

  def __init__(self, *args, **kw):
    if not kw.get('_from_entity'):
      # Assume newly-created entities needn't migrate.
      try:
        kw.setdefault('current_migration_version',
                      self.__class__.migration_version)
      except AttributeError:
        msg = ('migration_version required for %s'
                % self.__class__.__name__)
        logging.critical(msg)
        raise MigrationError, msg
    super(MigratingModel, self).__init__(*args, **kw)

  @classmethod
  def from_entity(cls, *args, **kw):
    # From_entity() calls __init__() with _from_entity=True
    obj = super(MigratingModel, cls).from_entity(*args, **kw)
    return obj.migrate()

  def migrate(self):
    target_version = self.__class__.migration_version
    if self.current_migration_version < target_version:
      migrations = range(self.current_migration_version+1, target_version+1)
      for self.current_migration_version in migrations:
        method_name = 'migrate_%d' % self.current_migration_version
        logging.debug('%s migrating to %d: %s'
                       % (self.__class__.__name__,
                          self.current_migration_version, method_name))
        getattr(self, method_name)()
      db.put(self)
    return self

MigratingModel intercepts the conversion from the raw datastore entity to the full db.Model instance. If current_migration_version has fallen behind the class's latest migration_version, then it runs a series of migrate_N() methods which do the heavy lifting.

For example:

"""Migrating model example"""

# ...imports...

class User(MigratingModel):
  migration_version = 3

  name = db.StringProperty() # deprecated: use first_name and last_name

  first_name = db.StringProperty()
  last_name = db.StringProperty()
  age = db.IntegerProperty()

  invalid = db.BooleanProperty() # to search for bad users

  def migrate_1(self):
    """Convert the unified name to dedicated first/last properties."""
    self.first_name, self.last_name = self.name.split()

  def migrate_2(self):
    """Ensure the users' names are capitalized."""
    self.first_name = self.first_name.capitalize()
    self.last_name = self.last_name.capitalize()

  def migrate_3(self):
    """Detect invalid accounts"""
    if self.age < 0 or self.age > 85:
      self.invalid = True

On a busy site, the migrate() method should retry if db.put() fails, and possibly log a critical error if the migration didn't work.

I haven't gotten there yet, but at some point I would probably mix-in my migrations from a separate file.

Final thoughts

It is hard to test on App Engine. It's hard to get access to your production data in a test environment, and at this time it is difficult-to-impossible to make a coherent snapshot backup. Therefore, for major changes, consider making a new version that uses a completely different model name which imports from the old model and migrates as it needs. (For example, User2 instead of User). That way, if you need to fall back to the previous version, you have an effective backup of the data.

jhs
+1 for Final Thoughts. It's Google, it's BigTable, use a new model name. Seems like the best solution.
Terry Lorber