views:

1872

answers:

5

So I just finished watching this talk on the Python Global Interpreter Lock (GIL) http://blip.tv/file/2232410.

The gist of it is that the GIL is a pretty good design for single core systems (Python essentially leaves the thread handling/scheduling up to the operating system). But that this can seriously backfire on multi-core systems and you end up with IO intensive threads being heavily blocked by CPU intensive threads, the expense of context switching, the ctrl-C problem[*] and so on.

So since the GIL limits us to basically executing a Python program on one CPU my thought is why not accept this and simply use taskset on Linux to set the affinity of the program to a certain core/cpu on the system (especially in a situation with multiple Python apps running on a multi-core system)?

So ultimately my question is this: has anyone tried using taskset on Linux with Python applications (especially when running multiple applications on a Linux system so that multiple cores can be used with one or two Python applications bound to a specific core) and if so what were the results? is it worth doing? Does it make things worse for certain workloads? I plan to do this and test it out (basically see if the program takes more or less time to run) but would love to hear from others as to your experiences.

Addition: David Beazley (the guy giving the talk in the linked video) pointed out that some C/C++ extensions manually release the GIL lock and if these extensions are optimized for multi-core (i.e. scientific or numeric data analysis/etc.) then rather than getting the benefits of multi-core for number crunching the extension would be effectively crippled in that it is limited to a single core (thus potentially slowing your program down significantly). On the other hand if you aren't using extensions such as this

The reason I am not using the multiprocessing module is that (in this case) part of the program is heavily network I/O bound (HTTP requests) so having a pool of worker threads is a GREAT way to squeeze performance out of a box since a thread fires off an HTTP request and then since it's waiting on I/O gives up the GIL and another thread can do it's thing, so that part of the program can easily run 100+ threads without hurting the CPU much and let me actually use the network bandwidth that is available. As for stackless Python/etc I'm not overly interested in rewriting the program or replacing my Python stack (availability would also be a concern).

[*] Only the main thread can receive signals so if you send a ctrl-C the Python interpreter basically tries to get the main thread to run so it can handle the signal, but since it doesn't directly control which thread is run (this is left to the operating system) it basically tells the OS to keep switching threads until it eventually hits the main thread (which if you are unlucky may take a while).

+5  A: 

Another solution is: http://docs.python.org/library/multiprocessing.html

Note 1: This is not a limitation of the Python language, but of CPython implementation.

Note 2: With regard to affinity, your OS shouldn't have a problem doing that itself.

ynimous
Unfortunately that won't work for me to well, part of the program is heavily network I/O bound (HTTP requests) so threading is a really lightweight way to squeeze a ton of performance out of the box (100+ threads grabbing stuff with almost no CPU load in that part of the program since I'm waiting 100's of milliseconds for remote servers to respond).
Kurt
If you are using threads for IO, GIL shouldn't (normally) be a problem. Python IO routines, are generally written in C, and release the GIL during their operation.The big problems with GIL are met when multiple threads need to run python bytecode on the interpreter.
ynimous
Yeah my app has several pools of threads, some are more I/O bound, some hit the CPU pretty good (BeautifulSoup on a 100k sized and badly formed HTML document is no picnic).
Kurt
Morover, I'd suggest taking a look at http://twistedmatrix.com/, if you are writting network applications. It will make your life a lot easier, after you passed its learning curve.
ynimous
Is there a reason why can't you use the miltiprocessing module for CPU bound threads ?
ynimous
Right now the app is pretty easy; since it's threads I have shared memory, and work queues which make handling everything really really simple (and pretty efficient). I'd rather not break it up and start with the whole message passing. Plus the CPU bound stuff isn't that bad, and leaving it in the threads is a whole lot simpler then breaking it all apart. As for twisted that's way overkill for what I need.
Kurt
It'd be nice if a question could have a talk page a la Wikipedia so the answers don't get cluttered up.
Kurt
+1  A: 

I've found the following rule of thumb sufficient over the years: If the workers are dependent on some shared state, I use one multiprocessing process per core (CPU bound), and per core a fix pool of worker threads (I/O bound). The OS will take care of assigining the different Python processes to the cores.

wr
You might want to watch the video I linked to, there's some potentially very dumb/ugly behavior possible by Python due to it leaving all the threading work to the OS.
Kurt
I did, and my suggestion to overcome this is to use multiprocessing to divide the CPU load to the processors (assuming that between those processes not that much information has to be shared), and use threads to account for the I/O demands.
wr
+1  A: 

I have never heard of anyone using taskset for a performance gain with Python. Doesn't mean it can't happen in your case, but definitely publish your results so others can critique your benchmarking methods and provide validation.

Personally though, I would decouple your I/O threads from the CPU bound threads using a message queue. That way your front end is now completely network I/O bound (some with HTTP interface, some with message queue interface) and ideal for your threading situation. Then the CPU intense processes can either use multiprocessing or just be individual processes waiting for work to arrive on the message queue.

In the longer term you might also want to consider replacing your threaded I/O front-end with Twisted or some thing like eventlets because, even if they won't help performance they should improve scalability. Your back-end is now already scalable because you can run your message queue over any number of machines+cpus as needed.

Van Gale
A: 

The Python GIL is per Python interpreter. That means the only to avoid problems with it while doing multiprocessing is simply starting multiple interpreters (i.e. using seperate processes instead of threads for concurrency) and then using some other IPC primitive for communication between the processes (such as sockets). That being said, the GIL is not a problem when using threads with blocking I/O calls.

The main problem of the GIL as mentioned earlier is that you can't execute 2 different python code threads at the same time. A thread blocking on a blocking I/O call is blocked and hence not executin python code. This means it is not blocking the GIL. If you have two CPU intensive tasks in seperate python threads, that's where the GIL kills multi-processing in Python (only the CPython implementation, as pointed out earlier). Because the GIL stops CPU #1 from executing a python thread while CPU #0 is busy executing the other python thread.

Merijn
A: 

Until such time as the GIL is removed from Python, co-routines may be used in place of threads. I have it on good authority that this strategy has been implemented by two successful start-ups, using greenlets in at least one case.