You are hitting a hard limit. As others have said there can be two limitations:
- the number of threads a process can spawn is limited by the OS (either globally or per process)
- the memory available is limited and each thread reserves its own stack (typically a few MB, and 4 MB * 900 --> 3.6 Go)
Incidentally, this is what is so interesting about Google Go routines. Instead of spawning as much thread as possible, the Go runtime will adapt the number of threads to the number of cores available, and manually multiplex the routines on these physical threads.
Furthermore, the routines are lightweight (reserving only 4 KB each) because they don't use a traditional stack (disappearance of the Stack Overflow !), meaning that you can effectively span a few thousands routines on a typical machine and it won't cost you much.
If you wish to experiment with extreme parallelism:
- find how to reduce the stack space allocated per thread (beware of Stack Overflow)
- switch to Go, or find another language implementing routines