views:

206

answers:

2

Module run via python myscript.py (not shell input)

import uuid
import time
import multiprocessing


def sleep_then_write(content):
    time.sleep(5)
    print(content)

if __name__ == '__main__':
    for i in range(15):
        p = multiprocessing.Process(target=sleep_then_write,
                                    args=('Hello World',))
        p.start()

print('Ah, what a hard day of threading...')

This script output the following:

Ah, what a hard day of threading...
Ah, what a hard day of threading...
Ah, what a hard day of threading...
Ah, what a hard day of threading...
Ah, what a hard day of threading...
Ah, what a hard day of threading...
Ah, what a hard day of threading...
Ah, what a hard day of threading...
Ah, what a hard day of threading...
AAh, what a hard day of threading..
h, what a hard day of threading...
Ah, what a hard day of threading...
Ah, what a hard day of threading...
Ah, what a hard day of threading...
Ah, what a hard day of threading...
Ah, what a hard day of threading...
Hello World
Hello World
Hello World
Hello World
Hello World
Hello World
Hello World
Hello World
Hello World
Hello World
Hello World
Hello World
Hello World
Hello World
Hello World

Firstly, why the heck did it print the bottom statement sixteen times (one for each process) instead of just the one time?

Second, notice the AAh, and h, about half way down; that was the real output. This makes me wary of using threads ever, now.

(Windows XP, Python 2.6.4, Core 2 Duo)

+2  A: 

Due to Windows's lack of the standard fork system call, the multiprocessing module works somewhat funny on Windows. For one, it imports the main module (your script) once for each process. For a detailed explanation, see the "Windows" subtitle at http://docs.python.org/library/multiprocessing.html#multiprocessing-programming

Matti Virkkunen
@Matti, thanks. Is the weird `h` and `AAh` stuff from that too? From the looks of it, I would say that two threads were sharing the same stdout at the same time.
orokusaki
Indeed they are sharing the same stdout. A context switch occured just as one process had written the first character of its output.
Matti Virkkunen
Separate threads sharing the same stdout is the same on unix too, no?
BlueRaja - Danny Pflughoeft
@BlueRaja That's not good... Perhaps the problem is that there really is only one stdout and Windows just doesn't know how to correctly share.
orokusaki
@orokusaki: this is how it works in every OS I'm aware of; the problem is not with Windows, but with your code. Use a lock (http://effbot.org/zone/thread-synchronization.htm) if you need to lock the output
BlueRaja - Danny Pflughoeft
+1  A: 

multiprocessing works by starting several processes. Each process loads a copy of your script (that way it has access to the "target" function), and then runs the target function.

You get the bottom print statement 16 times because the statement is sitting out there by itself and gets printed when you load the module. Put it inside a main block and it wont:

if __name__ == "__main__":
    print('Ah, what a hard day of threading...')

Regarding the "AAh" - you have multiple processes going and they'll produce output as they run, so you simply have the "A" from one process next to the "Ah" from another.

When dealing with multi process or multi threaded environments you have to think through locking and communication. This is not unique to multiprocessing; any concurrent library will have the same issues.

Parand
@Parand What if I'm resizing 5 images at the same time, and uploading them to S3 at the same time. Is there a chance my images could turn to goop because of this. With regards to locking, what benefit does concurrency give you if you're locking (ie, nothing happens until the last thread/process is done). This makes for an absolutely 100% worthless use of concurrency if I'm understanding correctly. The docs mention "side step the GIL" in the multiprocess lib. If you're locking, isn't that like paying more to go to the front of the line to tell the attendant that you'd like to start at the back.
orokusaki
+1 on the answer btw, thanks.
orokusaki
@orokusaki - the issue you're running into here is that you're using print, which means all processes use the same file handle (stdout), causing the output to intermingle. If you're uploading to S3 using multiple processes each will open its own socket and send the file separately, so there shouldn't be any problems.Locking is used when you're accessing the same resource (eg. stdout) from multiple threads of execution. In your S3 example it doesn't sound like there would be any resource conflict, so you probably don't have to worry about it.
Parand
Also, you can probably forget about "the GIL"; it's much lower level than you're dealing with. If I understand what you're trying to do here multiprocessing should work quite well for you.
Parand
@Parand, you have no idea how helpful that is for me. Thanks.
orokusaki