views:

1528

answers:

3

I have inherited a django+fastcgi application which needs to be modified to perform a lengthy computation (up to half an hour or more). What I want to do is run the computation in the background and return a "your job has been started" -type response. While the process is running, further hits to the url should return "your job is still running" until the job finishes at which point the results of the job should be returned. Any subsequent hit on the url should return the cached result.

I'm an utter novice at django and haven't done any significant web work in a decade so I don't know if there's a built-in way to do what I want. I've tried starting the process via subprocess.Popen(), and that works fine except for the fact it leaves a defunct entry in the process table. I need a clean solution that can remove temporary files and any traces of the process once it has finished.

I've also experimented with fork() and threads and have yet to come up with a viable solution. Is there a canonical solution to what seems to me to be a pretty common use case? FWIW this will only be used on an internal server with very low traffic.

+3  A: 

Maybe you could look at the problem the other way around.

Maybe you could try DjangoQueueService, and have a "daemon" listening to the queue, seeing if there's something new and process it.

changelog
That's definitely close to what I'm looking for. I stumbled upon that earlier but I'm hoping to find a solution that doesn't require me to add any additional dependencies. Thanks.
You can roll a queue system of your own then. I mean, it's not very difficult to do.
changelog
as the creator of Django Queue Service, I'd say look towards celery or one of those queuing services instead. It was a neat hack at the day, but has easily been surpassed now.
heckj
+1  A: 

I have to solve a similar problem now. It is not going to be a public site, but similarly, an internal server with low traffic.

Technical constraints:

  • all input data to the long running process can be supplied on its start
  • long running process does not require user interaction (except for the initial input to start a process)
  • the time of the computation is long enough so that the results cannot be served to the client in an immediate HTTP response
  • some sort of feedback (sort of progress bar) from the long running process is required.

Hence, we need at least two web “views”: one to initiate the long running process, and the other, to monitor its status/collect the results.

We also need some sort of interprocess communication: send user data from the initiator (the web server on http request) to the long running process, and then send its results to the reciever (again web server, driven by http requests). The former is easy, the latter is less obvious. Unlike in normal unix programming, the receiver is not known initially. The receiver may be a different process from the initiator, and it may start when the long running job is still in progress or is already finished. So the pipes do not work and we need some permamence of the results of the long running process.

I see two possible solutions:

  • dispatch launches of the long running processes to the long running job manager (this is probably what the above-mentioned django-queue-service is);
  • save the results permanently, either in a file or in DB.

I preferred to use temporary files and to remember their locaiton in the session data. I don't think it can be made more simple.

A job script (this is the long running process), myjob.py:

import sys
from time import sleep

i = 0
while i < 1000:
    print 'myjob:', i  
    i=i+1
    sleep(0.1)
    sys.stdout.flush()

django urls.py mapping:

urlpatterns = patterns('',
(r'^startjob/$', 'mysite.myapp.views.startjob'),
(r'^showjob/$',  'mysite.myapp.views.showjob'),
(r'^rmjob/$',    'mysite.myapp.views.rmjob'),
)

django views:

from tempfile import mkstemp
from os import fdopen,unlink,kill
from subprocess import Popen
import signal

def startjob(request):
     """Start a new long running process unless already started."""
     if not request.session.has_key('job'):
          # create a temporary file to save the resuls
          outfd,outname=mkstemp()
          request.session['jobfile']=outname
          outfile=fdopen(outfd,'a+')
          proc=Popen("python myjob.py",shell=True,stdout=outfile)
          # remember pid to terminate the job later
          request.session['job']=proc.pid
     return HttpResponse('A <a href="/showjob/">new job</a> has started.')

def showjob(request):
     """Show the last result of the running job."""
     if not request.session.has_key('job'):
          return HttpResponse('Not running a job.'+\
               '<a href="/startjob/">Start a new one?</a>')
     else:
          filename=request.session['jobfile']
          results=open(filename)
          lines=results.readlines()
          try:
               return HttpResponse(lines[-1]+\
                         '<p><a href="/rmjob/">Terminate?</a>')
          except:
               return HttpResponse('No results yet.'+\
                         '<p><a href="/rmjob/">Terminate?</a>')
     return response

def rmjob(request):
     """Terminate the runining job."""
     if request.session.has_key('job'):
          job=request.session['job']
          filename=request.session['jobfile']
          try:
               kill(job,signal.SIGKILL) # unix only
               unlink(filename)
          except OSError, e:
               pass # probably the job has finished already
          del request.session['job']
          del request.session['jobfile']
     return HttpResponseRedirect('/startjob/') # start a new one
jetxee
A: 

This isn't working for me because the process that creates the child process seems to wait until the child process finishes before it returns. I cant get it to just run in the background.