views:

542

answers:

3

I'm making a Django web-app which allows a user to build up a set of changes over a series of GETs/POSTs before committing them to the database (or reverting) with a final POST. I have to keep the updates isolated from any concurrent database users until they are confirmed (this is a configuration front-end), ruling out committing after each POST.

My preferred solution is to use a per-session transaction. This keeps all the problems of remembering what's changed (and how it affects subsequent queries), together with implementing commit/rollback, in the database where it belongs. Deadlock and long-held locks are not an issue, as due to external constraints there can only be one user configuring the system at any one time, and they are well-behaved.

However, I cannot find documentation on setting up Django's ORM to use this sort of transaction model. I have thrown together a minimal monkey-patch (ew!) to solve the problem, but dislike such a fragile solution. Has anyone else done this before? Have I missed some documentation somewhere?

(My version of Django is 1.0.2 Final, and I am using an Oracle database.)

+6  A: 

Multiple, concurrent, session-scale transactions will generally lead to deadlocks or worse (worse == livelock, long delays while locks are held by another session.)

This design is not the best policy, which is why Django discourages it.

The better solution is the following.

  1. Design a Memento class that records the user's change. This could be a saved copy of their form input. You may need to record additional information if the state changes are complex. Otherwise, a copy of the form input may be enough.

  2. Accumulate the sequence of Memento objects in their session. Note that each step in the transaction will involve fetches from the data and validation to see if the chain of mementos will still "work". Sometimes they won't work because someone else changed something in this chain of mementos. What now?

  3. When you present the 'ready to commit?' page, you've replayed the sequence of Mementos and are pretty sure they'll work. When the submit "Commit", you have to replay the Mementos one last time, hoping they're still going to work. If they do, great. If they don't, someone changed something, and you're back at step 2: what now?

This seems complex.

Yes, it does. However it does not hold any locks, allowing blistering speed and little opportunity for deadlock. The transaction is confined to the "Commit" view function which actually applies the sequence of Mementos to the database, saves the results, and does a final commit to end the transaction.

The alternative -- holding locks while the user steps out for a quick cup of coffee on step n-1 out of n -- is unworkable.

For more information on Memento, see this.

S.Lott
Deadlock and users stepping out for coffee are not an issue (there will be one controller, and by design the whole update is done under a single lock).Correct me if I'm wrong, but Mementos won't in fact work with the ORM, will they?
chrispy
Memento -- as a design pattern -- works with everything. Long-running, multi-step transaction with slowly accumulating locks will lead to deadlock. The only way to avoid deadlock is to have a single user.
S.Lott
The Django ORM provides an automatic mapping from tables to forms and back. Can I hook this up to the Memento pattern, or will I need to stop using this feature?
chrispy
A Memento is just an object that remembers the user's data. Since the input from a form is a simple dictionary, a copy of the dictionary can be used as a Memento. You might need to record additional information along with the copy of the form input.
S.Lott
So to ensure the data I pull from the ORM at each step matches where the user has got, should I reapply every queued memento for the user to the database?
chrispy
Yes... each step in the transaction is a replay of the prior steps. If the user makes additional changes, it appends another memento to the sequence of steps.
S.Lott
Bother. Some of those changes are going to take a while, definitely can't keep applying and reverting them. I think your answer is the right one for most situations, but mine is quite unusual. It's basically a front-end for a system administrator making heavyweight system changes.
chrispy
+1 It's hard to know without knowing exactly what kind of changes are being made, but the idea is that you could handle the "replay" in Python; don't actually apply changes to DB until they hit the final Commit, but write code that can take a sequence of steps and apply them to Python ORM objects before you display them to the user. Yes, it will be a bit of work but I think it's the right solution.
Carl Meyer
@chrispy: not apply and revert. Apply without save(). They will be silently dropped when the next request comes in.
S.Lott
I've ended up using my monkeypatch -- it greatly reduces the coding burden, and I'm replacing an old app that did the same thing anyway -- but I'm accepting your answer as the better general solution. Thanks again.
chrispy
A: 

In case anyone else ever has the exact same problem as me (I hope not), here is my monkeypatch. It's fragile and ugly, and changes private methods, but thankfully it's small. Please don't use it unless you really have to. As mentioned by others, any application using it effectively prevents multiple users doing updates at the same time, on penalty of deadlock. (In my application, there may be many readers, but multiple concurrent updates are deliberately excluded.)

I have a "user" object which persists across a user session, and contains a persistent connection object. When I validate a particular HTTP interaction is part of a session, I also store the user object on django.db.connection, which is thread-local.

def monkeyPatchDjangoDBConnection():
    import django.db
    def validConnection():
        if django.db.connection.connection is None:
            django.db.connection.connection = django.db.connection.user.connection
        return True
    def close():
        django.db.connection.connection = None
    django.db.connection._valid_connection = validConnection
    django.db.connection.close = close
monkeyPatchDBConnection()

def setUserOnThisThread(user):
    import django.db
    django.db.connection.user = user

This last is called automatically at the start of any method annotated with @login_required, so 99% of my code is insulated from the specifics of this hack.

chrispy
+1  A: 

I came up with something similar to the Memento pattern, but different enough that I think it bears posting. When a user starts an editing session, I duplicate the target object to a temporary object in the database. All subsequent editing operations affect the duplicate. Instead of saving the object state in a memento at each change, I store operation objects. When I apply an operation to an object, it returns the inverse operation, which I store.

Saving operations is much cheaper for me than mementos, since the operations can be described with a few small data items, while the object being edited is much bigger. Also I apply the operations as I go and save the undos, so that the temporary in the db always corresponds to the version in the user's browser. I never have to replay a collection of changes; the temporary is always only one operation away from the next version.

To implement "undo," I pop the last undo object off the stack (as it were--by retrieving the latest operation for the temporary object from the db) apply it to the temporary and return the transformed temporary. I could also push the resultant operation onto a redo stack if I cared to implement redo.

To implement "save changes," i.e. commit, I de-activate and time-stamp the original object and activate the temporary in it's place.

To implement "cancel," i.e. rollback, I do nothing! I could delete the temporary, of course, because there's no way for the user to retrieve it once the editing session is over, but I like to keep the canceled edit sessions so I can run stats on them before clearing them out with a cron job.