views:

1055

answers:

2

I have a blog-like application with stories and categories:

class Category(models.Model):
    ...
class Story(models.Model):
    categories = models.ManyToManyField(Category)
    ...

Now I know that when you save a new instance of a model with a many-to-many field, problems come up because the object is not yet in the database. This problem usually manifests itself on form submission, which can be neatly worked around with story_form.save(commit=False). What about a situation where there are no forms to speak of? In my case, I want to build an API to accept remote submissions. Since I like JSON, and a whole lot of other messaging in our company is in JSON (including outgoing messages from this server), I'd like to be able to receive the following:

{ "operation": "INSERT",
  "values": [
            { "datatype": "story",
              "categories": [4,6,8],
              "id":50,
              ...
            }
            ]
}

and implement a factory that converts the values to instances. But I'd like the factory to be as agnostic as possible to the type of operation. So:

{ "operation": "UPDATE",
  "values": [
            { "datatype": "story",
              "categories": [4,6,8],
              "id":50,
              ...
            }
            ]
}

should also be converted in the same way, except that INSERT ignores id and UPDATE gets the already existing instance and overrides it. (The remote submitter listens to a feed that gives it, among other things, the category objects to cache, so it can, and must, refer to them by id, but it doesn't have any direct communication with the database.)

My real question is: what's the most easiest consistent to inflate an instance of a Django model object that has a ManyToManyManager involved. As far as I can fathom, any insert of an object with a many-to-many field will require two database hits, just because it is necessary to obtain a new id first. But my current awkward solution is to save the object right away and mark it hidden, so that functions down the line can play with it and save it as something a little more meaningful. It seems like one step up would be overriding save so that objects without ids save once, copy some proxy field to categories, then save again. Best of all would be some robust manager object that saves me the trouble. What do you recommend?

+3  A: 

"As far as I can fathom, any insert of an object with a many-to-many field will require two database hits,..."

So what?

Micromanaging each individual database access generally isn't worth all the thinking. Do the simplest, most obvious thing so that Django can optimize cache for you.

Your application performance is --typically-- dominated by the slow download to the browser, and all the JPEGS, CSS and other static content that is part of your page.

Time spent in brain-cramping thinking about how to make two Primary Keys (for a many-to-many relationship) without doing two database accesses is not going to pay out well. Two PK's is usually two database accesses.


Edit

"...litters the database on error..."

Django has transactions. See http://docs.djangoproject.com/en/dev/topics/db/transactions/#managing-database-transactions. Use the @transaction.commit_manually decorator.

"forces validation that is meant to occur later"

Doesn't make sense -- update your question to explain this.

S.Lott
+1 for working in the phrase "brain-cramping". In all seriousness, though, +1 for pointing out that micromanaging the DB probably isn't worth it. Using Django's ORM brings too many benefits to worry about minutiae in a case like this.
Jarret Hardie
I agree that it isn't worth it to worry about two database hits. (I don't even think I could do any better with raw SQL.) But I don't like having to save early just to make my implementation work. It litters the database on error and forces validation that is meant to occur later.
David Berger
+1  A: 

I commented on S.Lott's post that I feel his answer is the best. He's right: if the goal is just to avoid two database hits, then you're just in for a world of unnecessary pain.

Reading your reference to ModelForm, however, if you are looking instead for a solution to that allows you to defer official saving in some way, you may wish to have a look at the save_instance() function in forms.models. The inner function save_m2m is how the delayed many-to-many save is accomplished for forms. Implementing something for models without forms would basically follow the same principle.

Having said that, and coming back to S.Lott's post, the case of a ModelForm and an actual Model are somewhat different. Because forms expose only a "safe" set of data to be edited in a browser ("safe" because it is filtered in some way, or excludes critical fields that a user shouldn't be editing), it is a reasonable design expectation that someone might need to add important information to the form-derived model before saving. This is why django has the commit=False.

This expectation falls down for cases where you are directly instantiating models. Here you have programmatic access to the model API, so you will probably find that using that API directly is easier to maintain and less error prone than through generalized indirection. I can understand why you are picturing the factory concept, but in this case you may find the effort to create a bullet-proof generalization for all manner of models is a complication that's just not worth it.

Jarret Hardie
I more or less like this. I still think it's nonsense that you can't access an m2m field before a save. The intended behavior is obvious.
David Berger