views:

43

answers:

2

I'm writing a simple browser-based front end that should be able to launch a background task and then get progress from it. I want the browser to receive a response saying whether the task launched successfully, and then poll to determine when it is done. However, the presence of a background task seems to be stopping the XMLHttpRequest response from being sent immediately, so I can't report the success of launching the process. Consider the following (simplified) code:

import SocketServer
import SimpleHTTPServer
import multiprocessing
import time

class MyProc(multiprocessing.Process):
    def run(self):
        print 'Starting long process..'
        for i in range(100): time.sleep(1)
        print 'Done long process'

class Page(SimpleHTTPServer.SimpleHTTPRequestHandler):
    def do_GET(self):
        if self.path == '/':
            print >>self.wfile, "<html><body><a href='/run'>Run</a></body></html>"
        if self.path == '/run':
            self.proc = MyProc()
            print 'Starting..'
            self.proc.start()
            print 'After start.'
            print >>self.wfile, "Process started."

httpd = SocketServer.TCPServer(('', 8000), Page)
httpd.serve_forever()

When I run this, and browse to http://localhost:8000, I get a button named "Run". When I click on it, the terminal displays:

Starting..
After start.

However the browser view does not change.. in fact the cursor is spinning. Only when I press Ctrl-C in the terminal to interrupt the program, then the browser is update with the message Process started.

The message After start is clearly being printed. Therefore I can assume that do_GET is returning after starting the process. Yet, the browser doesn't get a response until after I interrupt the long-running process. I have to conclude there is something blocking between do_GET and the response being sent, which is inside SimpleHTTPServer.

I've also tried this with threads and subprocess.Popen but ran into similar problems. Any ideas?

A: 

The answer is that the multiprocessing module forks a completely different process with its own stdout... So your application is running just as you wrote it:

  1. You start up the application in your terminal window.
  2. You click on the Run button in your browser which does a GET on /run
  3. You see the output of the current process in your terminal window, "Starting.."
  4. A new process is started, MyProc with its own stdout and stderr.
  5. MyProc prints to its stdout (which goes nowhere), 'Starting long process..'.
  6. The very moment MyProc starts up, your app prints to stdout, "After start." since it was not told to wait for any kind of response from MyProc before doing so.

What you need is to implement a Queue that communicates back and forth between your main application's process and the forked process. There's some multiprocessing-specific examples on how to do that here:

http://www.ibm.com/developerworks/aix/library/au-multiprocessing/

However, that article (like most articles from IBM) is kind of deep and overly complicated... You might want to take a look at a simpler example of how to use the "regular" Queue module (it is pretty much identical to the one included in multiprocessing):

http://www.artfulcode.net/articles/multi-threading-python/

The most important concepts to understand are how to shuffle data between processes using the Queue and how to use join() to wait for a response before proceeding.

Dan McDougall
My problem is not with communication to the multiprocess, my problem is with not getting the message "Process started" in the browser until I _quit_ the multiprocess. I should get this message immediately, since "After start" is clearly printed, but instead there is no HTTP response until the other process finishes, even though I am not waiting for it.
Steve
A: 

In addition to Steve's and my comments above, here is a solution that works for me.

The method to determine a content-length is a bit ugly. If you don't specify one, the browser may still show a spinning cursor although the content is shown. Closing the self.wfile instead could also work.

from cStringIO import StringIO

class Page(SimpleHTTPServer.SimpleHTTPRequestHandler):
    def do_GET(self):
        out = StringIO()
        self.send_response(200)
        self.send_header("Content-type", "text/html")
        if self.path == '/':
            out.write("<html><body><a href='/run'>Run</a></body></html>\n")
        elif self.path == '/run':
            self.proc = MyProc()
            print 'Starting..'
            self.proc.start()
            print 'After start.'
            out.write("<html><body><h1>Process started</h1></body></html>\n")
        text = out.getvalue()
        self.send_header("Content-Length", str(len(text)))
        self.end_headers()
        self.wfile.write(text)
Bernd Petersohn
Cool, it works, thank you! When I said I called `send_header` above, I only used it for Content-type. It seems Content-length is important here then, which I assume means that it has transfered data but not actually closed the GET connection. I wish I understood more deeply why.. somehow opening a background process stops SimpleHTTPServer from closing the connection? I tried `self.wfile.close()` in my original example but it didn't change anything.
Steve
@Steve: HTTP connections are often left open because then it is possible to make multiple requests without reconnecting (Pipelining). There is also an HTTP header for this ('Connection:', 'keep-alive'). The content-length is necessary to inform the browser when all data is received. In my example, Firefox could render the page without content-length but showed the spinning cursor. All this is not related to your background process.
Bernd Petersohn
I see. It's just that the same thing did not happen without the background process. In any case I understand that it's probably good practice to include content-length.
Steve