views:

99

answers:

3

How can multiple calculations be launched in parallel, while stopping them all when the first one returns?

The application I have in mind is the following: there are multiple ways of calculating a certain value; each method takes a different amount of time depending on the function parameters; by launching calculations in parallel, the fastest calculation would automatically be "selected" each time, and the other calculations would be stopped.

Now, there are some "details" that make this question more difficult:

  • The parameters of the function to be calculated include functions (that are calculated from data points; they are not top-level module functions). In fact, the calculation is the convolution of two functions. I'm not sure how such function parameters could be passed to a subprocess (they are not pickeable).
  • I do not have access to all calculation codes: some calculations are done internally by Scipy (probably via Fortran or C code). I'm not sure whether threads offer something similar to the termination signals that can be sent to processes.

Is this something that Python can do relatively easily?

A: 

Because of the global interpreter lock you would be hard pressed to get any speedup this way. In reality even multithreaded programs in Python only run on one core. Thus, you would just be doing N processes at 1/N times the speed. Even if one finished in half the time of the others you would still lose time in the big picture.

Pace
Yeah, that's why I mentioned subprocesses too. :)
EOL
+1  A: 

I would look at the multiprocessing module if you haven't already. It offers a way of offloading tasks to separate processes whilst providing you with a simple, threading like interface.

It provides the same kinds of primatives as you get in the threading module, for example, worker pools and queues for passing messages between your tasks, but it allows you to sidestep the issue of the GIL since your tasks actually run in separate processes.

The actual semantics of what you want are quite specific so I don't think there is a routine that fits the bill out-of-the-box, but you can surely knock one up.

Note: if you want to pass functions around, they cannot be bound functions since these are not pickleable, which is a requirement for sharing data between your tasks.

jkp
`multiprocessing` works wonderfully for this more tricky situation! I actually succeeded in passing a *non-pickleable function* as an argument to the calculations! I though that `multiprocessing` was simply a convenient wrapper around `subprocess` and `threading`, but it looks to me like `multiprocessing` provides much more.
EOL
PS: I did check that both cores were fully used by the calculations.
EOL
A: 

Processes can be started and killed trivially.

You can do this.

import subprocess
watch = []
for s in ( "process1.py", "process2.py", "process3.py" ):
    sp = subprocess.Popen( s )
    watch.append( sp )

Now you're simply waiting for one of those to finish. When one finishes, kill the others.

import time
winner= None
while winner is None:
    time.sleep(10)
    for w in watch:
        if w.poll() is not None:
            winner= w
            break
for w in watch:
    if w.poll() is None: w.kill()

These are processes -- not threads. No GIL considerations. Make the operating system schedule them; that's what it does best.

Further, each process is simply a script that simply solves the problem using one of your alternative algorithms. They're completely independent and stand-alone. Simple to design, build and test.

S.Lott
Yeah, but the thing is that the program parameters include *functions* that are calculated on the fly (they are not top-level functions in a module, and are therefore not pickle-able). How can subprocesses handle that?
EOL
@EOL: What? You absolutely must be able to write a stand-alone python script that will produce an answer one way, right? Write each variation as a stand-alone script. Get your parameters from a configuration file or something that you simply `import`. Don't mess with trying to pass things through the command-line interface to the subprocesses.
S.Lott
@S. Lott: Interesting idea. However, the parameters of the calculation are numerical functions that are relatively costly to rebuild from simple parameter files, so this "configuration file" approach slows things down. (Specifically, the calculation performs the convolution of functions that are interpolated and extrapolated from many data points. Each process would have to redo the interpolation/extrapolation, no?) I was hoping that Python would be more powerful than that. :)
EOL
@S. Lott: PS: thank you for the code snippets!
EOL
"Each process would have to redo the interpolation/extrapolation?" I don't see how or why. The processes could have a common front-end to create this "convolution of functions" and create an intermediate result set that the various solvers use. At this point, you're not talking about multiple processes and killing some when others are done. You're talking about something completely different. Please open a new question to discuss this convolution business separately and with some clarity and focus.
S.Lott
Thank you for your interest! I'm going to try to reuse your words: the "solvers" *are* the programs that perform the convolution. In other words, the current "front-end" performs an interpolation/extrapolation and builds functions f and g. The calculation to be performed (through different methods) is "convolve f and g over this interval"; here, the parameters to the convolution programs are non-pickleable functions. This is why I imagined that you would have to not pass these functions to the processes, but only the data that allows them to be "rebuilt" in the convolution codes.
EOL
PS: The `multiprocessing` modules allowed me to perform calculations in parallel while passing non-pickleable functions to them, and without resorting to multiple Python programs (one per subprocess). I therefore don't have any more questions! :)
EOL