views:

228

answers:

4

Let's say we have a system like this:

                                                                     ______
                              { application instances ---network--- (______)
                             {  application instances ---network--- |      |
requests ---> load balancer {   application instances ---network--- | data |
                             {  application instances ---network--- | base |
                              { application instances ---network--- \______/

A request comes in, a load balancer sends it to an application server instance, and the app server instances talk to a database (elsewhere on the LAN). The application instances can either be separate processes or separate threads. Just to cover all the bases, let's say there are several identical processes, each with a pool of identical application service threads.

If the database is performing slowly, or the network gets bogged down, clearly the throughput of request servicing is going to get worse.

Now, in all my pre-Python experience, this would be accompanied by a corresponding drop in CPU usage by the application instances -- they'd be spending more time blocking on I/O and less time doing CPU-intensive things.

However, I'm being told that with Python, this is not the case -- under certain Python circumstances, this situation can cause Python's CPU usage to go up, perhaps all the way to 100%. Something about the Global Interpreter Lock and the multiple threads supposedly causes Python to spend all its time switching between threads, checking to see if any of them have an answer yet from the database. "Hence the rise in single-process event-driven libraries of late."

Is that correct? Do Python application service threads actually use more CPU when their I/O latency increases?

+1  A: 

The key is to launch the application instances in separate processes. Otherwise multi-threading issues seem to be likely to follow.

Tom Leys
+1  A: 

No this is not the case. Stop spreading the FUD.

If your python app is blocked on a C API call ex. blocking sockets or file read, it probably released the GIL.

Unknown
+1  A: 

Something about the Global Interpreter Lock and the multiple threads supposedly causes Python to spend all its time switching between threads, checking to see if any of them have an answer yet from the database.

That is completely baseless. If all threads are blocked on I/O, Python should use 0% CPU. If there is one unblocked thread, it will be able to run without GIL contention; it will periodically release and reacquire the GIL, but it doesn't do any work "checking up" on the other threads.

However, it is possible on multicore systems for a thread to have to wait a while to reacquire the GIL if there is a CPU-bound thread, and for response times to drop (see this presentation for details). This shouldn't be an issue for most servers, though.

Miles
+6  A: 

In theory, no, in practice, its possible; it depends on what you're doing.

There's a full hour-long video and pdf about it, but essentially it boils down to some unforeseen consequences of the GIL with CPU vs IO bound threads with multicores. Basically, a thread waiting on IO needs to wake up, so Python begins "pre-empting" other threads every Python "tick" (instead of every 100 ticks). The IO thread then has trouble taking the GIL from the CPU thread, causing the cycle to repeat.

Thats grossly oversimplified, but thats the gist of it. The video and slides has more information. It manifests itself and a larger problem on multi-core machines. It could also occur if the process received signals from the os (since that triggers the thread switching code, too).

Of course, as other posters have said, this goes away if each has its own process.

Coincidentally, the slides and video explain why you can't CTRL+C in Python sometimes.

Richard Levasseur
Re: "a thread waiting on IO needs to wake up, so Python begins pre-empting other threads every Python tick": You're conflating two separate issues from the talk. The "release/acquire GIL every tick" issue is only the case with signals, and won't come into play with normal blocking I/O.
Miles
Yes, but, from what I understand, it uses signals (the signal to the GIL semphore) to communicate "Hey, i'm ready to run". The next check() then sees this, goes into "check every tick mode", then immediately releases-acquires the GIL, giving the OS a chance to run another thread. But then, a thread monopolizes the GIL, starving the others.
Richard Levasseur
i should clarify that comment since it sounds the same as what i said in my answer: I'm under the impression that, even after receiving IO, when the IO thread wakes up, it uses a signal to communicate that it wants the GIL
Richard Levasseur
When an IO thread wants to reacquire the the GIL, if another thread currently has it, the IO thread will *wait* for a "signal" (via the synchronization primitives being used) from the running thread. But this is distinct from the signal handlers registered by Python, which set a "pendingcalls_to_do" flag which cause the running thread to go into churn mode in order to attempt to run those pending calls on the main thread. Python doesn't handle the "signals" used for semaphores or thread scheduling, and it doesn't need to involve the main thread to switch between threads.
Miles
Ok, so its only external (real, honest to god unix signals, like CTRL+C) that cause the "check every tick" churn? That makes sense. My confusion resurfaces because of slides 29-31, where, on a single-core, two CPU bound threads have horrible performance, and he says its because of the GIL signaling going on. Can the same apply if its a combination of CPU and IO bound threads?
Richard Levasseur
Right, "GIL signaling" is referring to the threading/synchronization library, not the Unix signals that Python handles. Based on my testing, as long as there's only one CPU-bound thread, the presence of threads doing I/O doesn't have the same horrible GIL churning effect. You do see a impact on the latency of the I/O threads, though—but ironically the latency decreases if I/O is more frequent. Clearly, though, the best thing to do is to avoid mixing multithreading and CPU-bound Python bytecode.
Miles
Cool, thank you for the clarification. I updated my answer.
Richard Levasseur