I have a Python program that runs multiple threads using the multiprocessing module. The program runs fine when executed on a stand-alone machine with multiple cores, using all cores, or on a cluster when executing from the shell directly.
However, when trying to run it through SGE (Sun Grid Engine), either through a job script or using qrsh, it fails with the following error:
Error encountered: <class 'thread.error'> : can't start new thread.
Traceback (most recent call last):
.
.
.
File "/usr/local/lib/python2.7/threading.py", line 473, in start
_start_new_thread(self.__bootstrap, ())
thread.error: can't start new thread
This is the case whether or not I specify a parallel environment (e.g. using the "#$ -pe mpi 8" directive) or not.
I suspect that either I need to set up a parallel environment correctly, or tell SGE to allow multiple threads per processor, or both.
Does anyone have any insight or suggestions?
Thanks!
p.s. the actual program is large and part of a larger library, so I did not post the code directly. If need be, the code can be checked out from its repo. But I think this is a general issue with Python multiprocessing and SGE, rather than anything to do with the actual particular program per se.
p.p.s. The following code is sufficient to trigger the error:
from multiprocessing import Queue
q = Queue()
q.put(1)
The error is triggered whether executing in a parallel or serial environment.