views:

1119

answers:

3

I was working the following example from Doug Hellmann tutorial on multiprocessing:

import multiprocessing

def worker():
    """worker function"""
    print 'Worker'
    return

if __name__ == '__main__':
    jobs = []
    for i in range(5):
        p = multiprocessing.Process(target=worker)
        jobs.append(p)
        p.start()

When I tried to run it outside the if statement:

import multiprocessing

def worker():
    """worker function"""
    print 'Worker'
    return

jobs = []
for i in range(5):
    p = multiprocessing.Process(target=worker)
    jobs.append(p)
    p.start()

It started spawning processes non-stop, and the only way to stop it was reboot!

Why would that happen? Why it did not generate 5 processes and exit? Why do I need the if statement?

+1  A: 

I don't know about multiprocessing, but I suspect that it spawns child processes that have a different __name__ global. By removing the test, you are making every child start the spawning process again.

Marcelo Cantos
+21  A: 

On Windows there is no fork(), so multiprocessing imports current module to get access to worker function. Without if statement child process starts its own children and so on.

Denis Otkidach
It is interesting to know now, after it cost me 2 reboots ;)
Ηλίας
How would you stop this once it has started? Killing the process in task manager does not seem to affect it.
Ηλίας
There are limits and killall in POSIX systems, but I don't know solution for Windows.
Denis Otkidach
The solution is to reboot :-) If you are quick you can use a warm reboot within the first second. After that only a cold reboot works, since the OS completely freezes.
nikow
The number of seconds depends on how many you spawn in the loop "for i in range(5):" . The number 5 seem to give about a second to warm reboot. Without it, you can easily kill it.
Ηλίας
@nikow: Assuming you caught it within the first second, I wonder if "taskkill /F /IM python.exe" would kill it. I don't feel like rebooting, so not going to test this.
Brian
+3  A: 

Note that the documentation mentions that you need the if statement on windows (here).

However, the documentation doesn't say that this kills your machine almost instantly, requiring a reboot. So this can be quite confusing, especially if the use of multiprocessing happens in some function deep inside the code. No matter how deeply hidden it is, you still need the if check in the main program file. This pretty much rules out using multiprocessing in any kind of library.

multiprocessing in general seems a bit rough. It might have the interface of the thread interface, but there is just no simple way around the GIL.

For more complex parallelization problems I would also look at the subprocess module or some other libraries (like mpi4py or Parallel Python).

nikow
Any good tutorials on the subprocess package?
Ηλίας
Sorry, I didn't find any really simple ones (there is a PyMOTW article for example). Basically you create Python processes running your worker script. You can send/receive data by using the stdin/stdout of these processes (e.g., sending objects in pickled form).
nikow
Note that multiprocessing has its uses, and is still the most simple option if you can get it to work for your problem. But if it doesn't work out for you then using subprocess isn't that much extra work (maybe a hundred lines of code) and it gives you more options.
nikow
This recipe gives an effective usage option for the subprocess package: http://code.activestate.com/recipes/577045/
Noctis Skytower
@Noctis: That links leads to a multiprocessing example.
nikow
It's documented, as heavily as I felt comfortable without adding flashing banners. Windows doesn't have fork().If you have a suggestion for improving it, or patching it, feel free to contact me, file a bug, etc. I'm always happy to take user suggestions.Also note; internally, multiprocessing uses a Popen class similar to subprocess (Lib/multiprocessing/forking.py). It *is* sugar on top of raw subprocesses.
jnoller
@jnoller: Maybe one could tell explicitly that not having the `if` will often bomb the machine? Before I first used `multiprocessing` I read the docs, forgot about the `if`, and was then confused because instead of getting some exception my computer just froze. Did you look at Parallel Python? They also use some magic for convenience, but less aggressively and therefore don't have this problem. I feel that multiprocessing tries too hard to hide the complexities, because this just doesn't work and often fires back.
nikow
@jnoeller: I also get a little mad when `multiprocessing` is often sold as the solution to the GIL problem. That way `multiprocessing` was hyped as something that it is clearly not (but of course that isn't your fault).Anyway, I actually do use `multiprocessing` in one case where it does work well (and saves me the `subprocess` boilerplate code), so thank you for this module :-)
nikow