views:

351

answers:

3

I'm using python to interface a hardware usb sniffer device with the python API provided by the vendor and I'm trying to read (usb packets) from the device in a separate thread in an infinite loop (which works fine). The problem is that my main loop does not seem to ever get scheduled again (my read loop gets all the attention).

The code looks much like this:

from threading import Thread
import time
usb_device = 0

def usb_dump(usb_device):
    while True:
        #time.sleep(0.001)
        packet = ReadUSBDevice(usb_device)
        print "packet pid: %s" % packet.pid

class DumpThread(Thread):
    def run(self):
        usb_dump()

usb_device = OpenUSBDevice()
t = DumpThread()
t.start()
print "Sleep 1"
time.sleep(1)
print "End"
CloseUSBDevice(usb_device)
sys.exit(0)

(I could paste actual code, but since you need the hardware device I figure it won't help much).

I'm expecting this code to start dumping usb packets for about a second before the main thread terminates the entire program. However, all I see is "Sleep 1" and then the usb_dump() procedure runs forever. If I uncomment the "time.sleep(0.001)" statement in the inner loop of the usb_dump() procedure things start working the way I expect, but then the python code becomes unable to keep up with all the packets coming in :-(

The vendor tells me that this is an python scheduler problem and not their api's fault and therefor won't help me:

«However, it seems like you are experiencing some nuances when using threading in Python. By putting the time.sleep in the DumpThread thread, you are explicitly signaling to the Python threading system to give up control. Otherwise, it is up the Python interpreter to determine when to switch threads and it usually does that after a certain number of byte code instructions have been executed.»

Can somebody confirm that python is the problem here? Is there another way to make the DumpThread release control? Any other ideas?

A: 

I think the vendor is correct. Assuming this is CPython, there is no true parallel threading; only one thread can execute at a time. This is because of the implementation of the global interpreter lock.

You may be able to achieve an acceptable solution by using the multiprocessing module, which effectively sidesteps the garbage collector's lock by spawning true sub-processes.

Another possibility that may help is to modify the scheduler's switching behaviour.

ire_and_curses
+2  A: 

I'm assuming you wrote a Python C module that exposes the ReadUSBDevice function, and that it's intended to block until a USB packet is received, then return it.

The native ReadUSBDevice implementation needs to release the Python GIL while it's waiting for a USB packet, and then reacquire it when it receives one. This allows other Python threads to run while you're executing native code.

http://docs.python.org/c-api/init.html#thread-state-and-the-global-interpreter-lock

While you've unlocked the GIL, you can't access Python. Release the GIL, run the blocking function, then when you know you have something to return back to Python, re-acquire it.

If you don't do this, then no other Python threads can execute while your native blocking is going on. If this is a vendor-supplied Python module, failing to release the GIL during native blocking activity is a bug.

Note that if you're receiving many packets, and actually processing them in Python, then other threads should still run. Multiple threads which are actually running Python code won't run in parallel, but it'll frequently switch between threads, giving them all a chance to run. This doesn't work if native code is blocking without releasing the GIL.

edit: I see you mentioned this is a vendor-supplied library. If you don't have source, a quick way to see if they're releasing the GIL: start the ReadUSBDevice thread while no USB activity is happening, so ReadUSBDevice simply sits around waiting for data. If they're releasing the GIL, the other threads should run unimpeded. If they're not, it'll block the whole interpreter. That would be a serious bug.

Glenn Maynard
I forgot to mention:The ReadUSBDevice is a python c module (as you expected) and it has a (default) 500ms timeout so it waits for packets for half-a-second and then returns. I'm guessing that it releases the Python GIL (what ever that is, will read about it) for each iteration. However, since I'm running an infinite loop the thread just starts on another loop.Python should still have been able to schedule then main loop for each iteration, right?
Vegar Westerlund
It should be able to run any other Python threads during the whole 500ms timeout, as long as it's releasing the GIL like it's supposed to. If it's not, I'd expect it to exit *eventually*--but possibly only after dozens of iterations over the 500ms timeout.
Glenn Maynard
Sort of odd that you marked an answer that came an hour later with less information than this one as the solution.
Glenn Maynard
+1  A: 

Your vendor would be right if yours was pure python code; however, C extensions may release the GIL, and therefore allows for actual multithreading.

In particular, time.sleep does release the GIL (you can check it directly from the source code, here - look at floatsleep implementation); so your code should not have any problem. As a further proof, I have made also a simple test, just removing the calls to USB, and it actually works as expected:

from threading import Thread
import time
import sys

usb_device = 0

def usb_dump():
    for i in range(100):
        time.sleep(0.001)
        print "dumping usb"

class DumpThread(Thread):
    def run(self):
        usb_dump()

t = DumpThread()
t.start()
print "Sleep 1"
time.sleep(1)
print "End"
sys.exit(0)

Finally, just a couple of notes on the code you posted:

  • usb_device is not being passed to the thread. You need to pass it as a parameter or (argh!) tell the thread to get it from the global namespace.
  • Instead of forcing sys.exit(), it could be better to just signal the thread to stop, and then closing USB device. I suspect your code could get some multithreading issue, as it is now.
  • If you need just a periodic poll, threading.Timer class may be a better solution for you.

[Update] About the latest point: as told in the comment, I think a Timer would better fit the semantic of your function (a periodic poll) and would automatically avoid issues with the GIL not being released by the vendor code.

Roberto Liffredo
On your notes: I know the code looks ugly. I started out with an example code from the vendor and made as little changes as possible while still showing my point. I figured they would be familiar with their own example... This is not production code at all.
Vegar Westerlund
Ok, so if I assume that the vendor C extension (which lays behind the ReadUSBDevice call) is not releasing the GIL (as it should) then adding a small sleep (as I tried) will make the code work as expected, because now I'm explicitly releasing the GIL on each iteration. This would explain the behavior I'm seeing. Is there another way to release the GIL? Perhaps I can do an if that sleeps every 50 rounds of something?
Vegar Westerlund
It is your vendor responsibility to release the GIL, and you cannot do much on that. What about using a threading.Timer object? It would automatically implement both the semantic of your function (periodic poll) and you would not have to care about their implementation.
Roberto Liffredo
I'll tell the vendor to fix their code then. And in the meantime I'll try to manually force the GIL to be released at some interval. I cannot just use a threading.Timer object because I want to to other useful things in while dumping the data (not shown in the sample code). Thanks for all the help.
Vegar Westerlund