views:

85

answers:

2

Hi-

I'm having a bit of trouble in Google App Engine ensuring that my data is correct when using an ancestor relationship without key names.

Let me explain a little more: I've got a parent entity category, and I want to create a child entity item. I'd like to create a function that takes a category name and item name, and creates both entities if they don't exist. Initially I created one transaction and created both in the transaction if needed using a key name, and this worked fine. However, I realized I didn't want to use the name as the key as it may need to change, and I tried within my transaction to do this:

def add_item_txn(category_name, item_name):
  category_query = db.GqlQuery("SELECT * FROM Category WHERE name=:category_name", category_name=category_name)
category = category_query.get()
if not category:
    category = Category(name=category_name, count=0)

item_query = db.GqlQuery("SELECT * FROM Item WHERE name=:name AND ANCESTOR IS :category", name=item_name, category=category)
item_results = item_query.fetch(1)
if len(item_results) == 0:
  item = Item(parent=category, name=name)

db.run_in_transaction(add_item_txn, "foo", "bar")

What I found when I tried to run this is that App Engine rejects this as it won't let you run a query in a transaction: Only ancestor queries are allowed inside transactions.

Looking at the example Google gives about how to address this:

def decrement(key, amount=1):
    counter = db.get(key)
    counter.count -= amount
    if counter.count < 0:    # don't let the counter go negative
        raise db.Rollback()
    db.put(counter)

q = db.GqlQuery("SELECT * FROM Counter WHERE name = :1", "foo")
counter = q.get()
db.run_in_transaction(decrement, counter.key(), amount=5)

I attempted to move my fetch of the category to before the transaction:

def add_item_txn(category_key, item_name):
    category = category_key.get()
    item_query = db.GqlQuery("SELECT * FROM Item WHERE name=:name AND ANCESTOR IS :category", name=item_name, category=category)
    item_results = item_query.fetch(1)
    if len(item_results) == 0:
         item = Item(parent=category, name=name)

category_query = db.GqlQuery("SELECT * FROM Category WHERE name=:category_name", category_name="foo")
category = category_query.get()
if not category:
    category = Category(name=category_name, count=0)
db.run_in_transaction(add_item_txn, category.key(), "bar")

This seemingly worked, but I found when I ran this with a number of requests that I had duplicate categories created, which makes sense, as the category is queried outside the transaction and multiple requests could create multiple categories.

Does anyone have any idea how I can create these categories properly? I tried to put the category creation into a transaction, but received the error about ancestor queries only again.

Thanks!

Simon

A: 

A couple of things to note (I think they're probably just typos) -

  1. The first line of your transactional method calls get() on a key - this is not a documented function. You don't need to have the actual category object in the function anyway - the key is sufficient in both of the places where you are using the category entity.

  2. You don't appear to be calling put() on either of the category or the item (but since you say you are getting data in the datastore, I assume you have left this out for brevity?)

As far as a solution goes - you could attempt to add a value in memcache with a reasonable expiry -

if memcache.add("category.%s" % category_name, True, 60): create_category(...)

This at least stops you creating multiples. It is still a bit tricky to know what do if the query does not return the category, but you cannot grab the lock from memcache. This means the category is in the process of being created.

If the originating request comes from the task queue, then just throw an exception so the task gets re-run.

Otherwise you could wait a bit and query again, although this is a little dodgy.

If the request comes from the user, then you could tell them there has been a conflict and to try again.

hawkettc
thanks for catching my typos - rather than copying my exact function for brevity I heavily edited it and made those :( This is an interesting solution, is memcache guaranteed not to collide?
Simon
memcache.add() guarantees atomicity - i.e. you can only add a key once - guaranteed. Of course with the above code, the entry expires in 60s and you can then grab the lock again. http://code.google.com/appengine/docs/python/memcache/clientclass.html#Client_add
hawkettc
A: 

Here is an approach to solving your problem. It is not an ideal approach in many ways, and I sincerely hope that someone other AppEnginer will come up with a neater solution than I have. If not, give this a try.

My approach utilizes the following strategy: it creates entities that act as aliases for the Category entities. The name of the Category can change, but the alias entity will retain its key, and we can use elements of the alias's key to create a keyname for your Category entities, so we will be able to look up a Category by its name, but its storage is decoupled from its name.

The aliases are all stored in a single entity group, and that allows us to use a transaction-friendly ancestor query, so we can lookup or create a CategoryAlias without risking that multiple copies will be created.

When I want to lookup or create a Category and item combo, I can use the category's keyname to programatically generate a key inside the transaction, and we are allowed to get an entity via its key inside a transaction.

class CategoryAliasRoot(db.Model):
    count = db.IntegerProperty()
    # Not actually used in current code; just here to avoid having an empty
    # model definition.

    __singleton_keyname = "categoryaliasroot"

    @classmethod
    def get_instance(cls):
            # get_or_insert is inherently transactional; no chance of
            # getting two of these objects.
        return cls.get_or_insert(cls.__singleton_keyname, count=0)

class CategoryAlias(db.Model):
    alias = db.StringProperty()

    @classmethod
    def get_or_create(cls, category_alias):
        alias_root = CategoryAliasRoot.get_instance()
        def txn():
            existing_alias = cls.all().ancestor(alias_root).filter('alias = ', category_alias).get()
            if existing_alias is None:
                existing_alias = CategoryAlias(parent=alias_root, alias=category_alias)
                existing_alias.put()

            return existing_alias

        return db.run_in_transaction(txn)

    def keyname_for_category(self):
        return "category_" + self.key().id

    def rename(self, new_name):
        self.alias = new_name
        self.put()

class Category(db.Model):
    pass

class Item(db.Model):
    name = db.StringProperty()

def get_or_create_item(category_name, item_name):

    def txn(category_keyname):
        category_key = Key.from_path('Category', category_keyname)

        existing_category = db.get(category_key)
        if existing_category is None:
            existing_category = Category(key_name=category_keyname)
            existing_category.put()

        existing_item = Item.all().ancestor(existing_category).filter('name = ', item_name).get()
        if existing_item is None:
            existing_item = Item(parent=existing_category, name=item_name)
            existing_item.put()

        return existing_item

    cat_alias = CategoryAlias.get_or_create(category_name)
    return db.run_in_transaction(txn, cat_alias.keyname_for_category())

Caveat emptor: I have not tested this code. Obviously, you will need to change it to match your actual models, but I think that the principles that it uses are sound.

UPDATE: Simon, in your comment, you mostly have the right idea; although, there is an important subtlety that you shouldn't miss. You'll notice that the Category entities are not children of the dummy root. They do not share a parent, and they are themselves the root entities in their own entity groups. If the Category entities did all have the same parent, that would make one giant entity group, and you'd have a performance nightmare because each entity group can only have one transaction running on it at a time.

Rather, the CategoryAlias entities are the children of the bogus root entity. That allows me to query inside a transaction, but the entity group doesn't get too big because the Items that belong to each Category aren't attached to the CategoryAlias.

Also, the data in the CategoryAlias entity can change without changing the entitie's key, and I am using the Alias's key as a data point for generating a keyname that can be used in creating the actual Category entities themselves. So, I can change the name that is stored in the CategoryAlias without losing my ability to match that entity with the same Category.

Adam Crossland
Adam,Thanks for the solution - much appreciated. I read through it a few times, and as I understand it what you're doing here is creating a dummy root entity to ensure that when the category is fetched it's fetched from an entity group (and thus allowed in a transaction) - is that right?thanks!-simon
Simon
@Simon, I have edited my answer in response to your comment.
Adam Crossland
Very big distinction there re: the parent entity being distinct to the category, thanks for your answer!
Simon