views:

783

answers:

5

Here is a simple example of a django view with a potential race condition:

# myapp/views.py
from django.contrib.auth.models import User
from my_libs import calculate_points

def add_points(request):
    user = request.user
    user.points += calculate_points(user)
    user.save()

The race condition should be fairly obvious: A user can make this request twice, and the application could potentially execute user = request.user simultaneously, causing one of the requests to override the other.

Suppose the function calculate_points is relatively complicated, and makes calculations based on all kinds of weird stuff that cannot be placed in a single update and would be difficult to put in a stored procedure.

So here is my question: What kind of locking mechanisms are available to django, to deal with situations similar to this?

+3  A: 

You have many ways to single-thread this kind of thing.

One standard approach is Update First. You do an update which will seize an exclusive lock on the row; then do your work; and finally commit the change. For this to work, you need to bypass the ORM's caching.

Another standard approach is to have a separate, single-threaded application server that isolates the Web transactions from the complex calculation.

  • Your web application can create a queue of scoring requests, spawn a separate process, and then write the scoring requests to this queue. The spawn can be put in Django's urls.py so it happens on web-app startup. Or it can be put into separate manage.py admin script. Or it can be done "as needed" when the first scoring request is attempted.

  • You can also create a separate WSGI-flavored web server using Werkzeug which accepts WS requests via urllib2. If you have a single port number for this server, requests are queued by TCP/IP. If your WSGI handler has one thread, then, you've achieved serialized single-threading. This is slightly more scalable, since the scoring engine is a WS request and can be run anywhere.

Yet another approach is to have some other resource that has to be acquired and held to do the calculation.

  • A Singleton object in the database. A single row in a unique table can be updated with a session ID to seize control; update with session ID of None to release control. The essential update has to include a WHERE SESSION_ID IS NONE filter to assure that the update fails when the lock is held by someone else. This is interesting because it's inherently race-free -- it's a single update -- not a SELECT-UPDATE sequence.

  • A garden-variety semaphore can be used outside the database. Queues (generally) are easier to work with than a low-level semaphore.

S.Lott
+1 I really like the scoring request queue idea.
Tom Leys
Great answer. Somehow access to the database row has to be serialized and I think queues are more scalable than locks. @Fragsworth: see this project for a simple to use implementation of queues in Django that uses RabbitMQ: http://ask.github.com/celery/introduction.html
Van Gale
+3  A: 

Database locking is the way to go here. There are plans to add "select for update" support to Django (here), but for now the simplest would be to use raw SQL to UPDATE the user object before you start to calculate the score.

zooglash
A: 

This may be oversimplifying your situation, but what about just a JavaScript link replacement? In other words when the user clicks the link or button wrap the request in a JavaScript function which immediately disables / "greys out" the link and replaces the text with "Loading..." or "Submitting request..." info or something similar. Would that work for you?

Wayne Koorts
-1 it still does not protect the site. time to time users are using other http clients than browsers. i.e. user might use wget to fetch given URL, then disabling URL by jscript won't save you. Jscript should be used just to make page user friednly if you want to, but you should not use it to fix problems within server side application.
SashaN
@SashaN: The poster didn't say that this wouldn't only be accessed through a web browser. We can't immediately assume all other exception cases like wget. I also prefixed the answer with "This may be oversimplifying your situation..." to cover the exception cases, as this suggestion may well be a suitable solution for many. Think also of future viewers of this question who may have a slightly different scenario in which this answer might be just the ticket. I certainly don't accept that it deserves a "not helpful" vote, but I do appreciate you at least providing a reason.
Wayne Koorts
+4  A: 

You could use transactions to encapsulate your request. At the per-request level it looks like this:

from django.db import transaction

@transaction.autocommit
def add_points(request):
    ...

This shoudl be sufficient if you read and update the user data within the request.

If the user can also edit other fields in the form and then save this data, you need to do something like this:

Store the last modified time stamp in the request. Before saving the new data, check to see if it is still the same. Otherwise there is a race condition and you can display a message.

Ber
+1. This problem is already solved with transactions. Just run your view inside a transaction **and** use a transactional data store. If there is still a possibility of a race condition, detect it and rollback.
muhuk
+1. Thank you, this is *exactly* what I was looking for.
Fragsworth
No, transactions are not enough to prevent the problem - unless you are using SERIALIZABLE transaction isolation level. Consider this scenario:1. Thread A reads the user instance, user.points is 172. Thread B reads the user instance, user.points is 173. Thread A increments the points and saves the user (this causes the user record to be locked)4. Thread A commits the transaction5. Now thread B can save the user, overriding what thread A saved.6. Thread B commits the transaction7. The result - user.points is 18 in the database, instead of 19.
zooglash
(sorry for the bad formatting, seems like line breaks are not preserved inside comments)
zooglash
@zooglash: you are right. That's why I included the advice on using timestamps to track this case. In the case of adding point, I assume that points are not read before, but rather added up atomically in the add_points() method. In this case, a transaction is sufficient.
Ber
+3  A: 

As of Django 1.1 you can use the ORM's F() expressions to solve this specific problem. For more details see the documentation:

http://docs.djangoproject.com/en/1.1/ref/models/instances/#updating-attributes-based-on-existing-fields

bjunix