views:

63

answers:

1

I've written a web crawler that I'd like to be able to stop via the keyboard. I don't want the program to die when I interrupt it; it needs to flush its data to disk first. I also don't want to catch KeyboardInterruptedException, because the persistent data could be in an inconsistent state.

My current solution is to define a signal handler that catches SIGINT and sets a flag; each iteration of the main loop checks this flag before processing the next url.

However, I've found that if the system happens to be executing socket.recv() when I send the interrupt, I get this:

^C
Interrupted; stopping...  // indicates my interrupt handler ran
Traceback (most recent call last):
  File "crawler_test.py", line 154, in <module>
    main()
  ...
  File "/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/socket.py", line 397, in readline
    data = recv(1)
socket.error: [Errno 4] Interrupted system call

and the process exits completely. Why does this happen? Is there a way I can prevent the interrupt from affecting the system call?

+1  A: 

socket.recv() calls the underlying POSIX-compliant recv function in the C layer, which, in turn, will return an error code 4 when the process receives a SIGINT while waiting for incoming data in recv(). This error code can be used on the C side (if you were programming in C) to detect that recv() returned not because there is more data available on the socket but because the process received a SIGINT. Anyway, this error code is turned into an exception by Python, and since it is never caught, it terminates your application with the traceback you see. The solution is simply to catch socket.error, check the error code and if it is equal to 4, ignore the exception silently. Something like this:

try:
    # do something
    result = conn.recv(bufsize)
except socket.error as (errno, msg):
    if errno != 4:
        raise
Tamás
Great explanation, thank you.
danben