views:

101

answers:

3

I'm quite new to threading in Python and have a couple of beginner questions.

When starting more than say fifty threads using the Python threading module I start getting MemoryError. The threads themselves are very slim and not very memory hungry, so it seems like it is the overhead of the threading that causes the memory issues.

  • Is there something I can do to increase the memory capacity or otherwise make Python allow for a larger number of threads?
  • What is the maximum number of threads you've been able to run in your Python code using the threading module? Did you do any tricks to achieve that number?
  • Are there any other caveats to be aware of when using the threading module?
+3  A: 

Your question cannot be answered in a general way, as good usage of threading always depends on concrete problem to be solved. You also do not tell us, which Python version you are using, so I assume you use the "default" CPython and not IronPython or something like that. To give you some hints and ideas to further think about your problem:

  • Why do you need so much treads? Your machine will probably not be able to run them in parallel anyway.
  • Have a look at Stackless Python. Don't know the current status of the project, but I think it was designed for that kind of problems.
  • The global interpreter lock prevents pure Python code from really running in parallel. But C methods can be run in parallel, so in real life it's sometimes hard to guess, how Python will behave regarding parallelization.
  • Python has many good libraries. Have a look whether one of them already has a solution for your design problem. If your problem is network related, have a look at Twisted for example.
Achim
I'm using CPython.
knorv
+1 for Stackless Python
kotlinski
+1  A: 

Eventlets-Threads have been designed for low memory consumption. The general purpose call spawn can be easily used to spawn new threads.

RSabet
Eventlets looks very good! Thanks!
knorv
Also take a look at gevent, which was built to fix some of the Eventlet's bugs and has much better performance.
Denis Bilenko
+1  A: 

The Global Interpreter Lock is known to have a strong impact on the performance limitations of standard CPython. Thus the multiprocessing module notes:

multiprocessing is a package that supports spawning processes using an API similar to the threading module. The multiprocessing package offers both local and remote concurrency, effectively side-stepping the Global Interpreter Lock by using subprocesses instead of threads. Due to this, the multiprocessing module allows the programmer to fully leverage multiple processors on a given machine. It runs on both Unix and Windows.

The GIL probably isn't the cause of your MemoryErrors, but it is something to be aware of.

msw