I've written a web crawler that I'd like to be able to stop via the keyboard. I don't want the program to die when I interrupt it; it needs to flush its data to disk first. I also don't want to catch KeyboardInterruptedException
, because the persistent data could be in an inconsistent state.
My current solution is to define a signal handler that catches SIGINT
and sets a flag; each iteration of the main loop checks this flag before processing the next url.
However, I've found that if the system happens to be executing socket.recv()
when I send the interrupt, I get this:
^C
Interrupted; stopping... // indicates my interrupt handler ran
Traceback (most recent call last):
File "crawler_test.py", line 154, in <module>
main()
...
File "/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/socket.py", line 397, in readline
data = recv(1)
socket.error: [Errno 4] Interrupted system call
and the process exits completely. Why does this happen? Is there a way I can prevent the interrupt from affecting the system call?