views:

93

answers:

2

I have a Django web application and I have some tasks that should operate (or actually: be initiated) on the background.

The application is deployed as follows:

  • apache2-mpm-worker;
  • mod_wsgi in daemon mode (1 process, 15 threads).

The background tasks have the following characteristics:

  • they need to operate in a regular interval (every 5 minutes or so);
  • they require the application context (i.e. the application packages need to be available in memory);
  • they do not need any input other than database access, in order to perform some not-so-heavy tasks such as sending out e-mail and updating the state of the database.

Now I was thinking that the most simple approach to this problem would be simply to piggyback on the existing application process (as spawned by mod_wsgi). By implementing the task as part of the application and providing an HTTP interface for it, I would prevent the overhead of another process that is holding all of the application into memory. A simple cronjob can be setup that sends a request to this HTTP interface every 5 minutes and that would be it. Since the application process provides 15 threads and the tasks are quite lightweight and only running every 5 minutes, I figure they would not be hindering the performance of the web application's user-facing operations.

Yet... I have done some online research and I have seen nobody advocating this approach. Many articles suggest a significantly more complex approach based on a full-blown messaging component (such as Celery, which uses RabbitMQ). Although that's sexy, it sounds like overkill to me. Some articles suggest setting up a cronjob that executes a script which performs the tasks. But that doesn't feel very attractive either, as it results in creating a new process that loads the entire application into memory, performs some tiny task, and destroys the process again. And this is repeated every 5 minutes. Does not sound like an elegant solution.

So, I'm looking for some feedback on my suggested approach as described in the paragraph before the preceeding paragraph. Is my reasoning correct? Am I overlooking (potential) problems? What about my assumption that application's performance will not be impeded?

A: 

All are reasonable approaches depending on your specific requirements.

Another is to fire up a background thread within the process when the WSGI script is loaded. This background thread could simply sleep and wake up occasionally to perform required work and then go back to sleep.

This method necessitates though that you have at most one Django process which the background thread runs in to avoid different processing doing the same work on any database etc.

Using daemon mode with a single process as you are would satisfy that criteria. There are potentially other ways you could achieve that though even in a multiprocess configuration.

Graham Dumpleton
Thanks for the suggestion Graham. Does mod_wsgi facilitate setting up something like that? Or would I have to construct it myself, in Python? I'd rather not enter the minefield called threaded programming, as I'm not sure if I'll make it out alive.
Tim Molendijk
Your configuration for daemon mode is already multithreaded, so you have to pay attention to it for your own code of your Django application anyway. As to providing anything special, no, you just have to use threading module to create thread.
Graham Dumpleton
A: 

Note that celery works without RabbitMQ as well. It can use a ghetto queue (SQLite, MySQL, Postgres, etc, and Redis, MongoDB), which is useful in testing or for simple setups where RabbitMQ seems overkill.

See http://ask.github.com/celery/tutorials/otherqueues.html (Using Celery with Redis/Database as the messaging queue.)

asksol