views:

169

answers:

2

I'm looking for a way to do asynchronous data processing with a daemon that uses Django ORM. However, the ORM isn't thread-safe; it's not thread-safe to try to retrieve / modify django objects from within threads. So I'm wondering what the correct way to achieve asynchrony is?

Basically what I need to accomplish is taking a list of users in the db, querying a third party api and then making updates to user-profile rows for those users. As a daemon or background process. Doing this in series per user is easy, but it takes too long to be at all scalable. If the daemon is retrieving and updating the users through the ORM, how do I achieve processing 10-20 users at a time? I would use a standard threading / queue system for this but you can't thread interactions like

models.User.objects.get(id=foo) ...

Django itself is an asynchronous processing system which makes asynchronous ORM calls(?) for each request, so there should be a way to do it? I haven't found anything in the documentation so far.

Cheers

+2  A: 

If your asynchronous processing is being done in its own process, then thread safety is not an issue because your threads are not sharing an address space, so they can't interfere with each other. They would each have their own copy of model objects. Concurrency will be controlled by the database with transactions. So your fine.

If your going to spawn a thread inside one of the web server's processes to do your asynchronous business, then you need to lock all API calls that are not thread safe.

from threading import Lock

Apache uses multiple processes via the fork() system call to handle conncurrent web requests. This is why Django's ORM APIs don't need to be thread safe. I believe Apache may be able to use threads instead of processes, but it think that feature has to be disabled in order to use Django.

http://groups.google.com/group/django-developers/browse_thread/thread/905f79e350525c95

Btw, do you understand the difference between a thread and a process? Its kind of important.

Jay
+2  A: 

Have a look at celery . I guess that would solve your problem. It uses multiprocessing module. It needs a (very) little setup, however helps a lot in scaling.

Shekhar