Why can you have millions of actors in an application, but just 10,000 threads is too many? How is it that creating millions of actors is practical, but more than a couple threads is not? What can threads do that actors can't (or else we would use actors all the time!)?
views:
166answers:
4Assuming Scala and JVM:
Each thread reserves some amount of memory for its stack:
java -v
-Xss<size> set java thread stack size
So creating many threads will eat up your memory.
Multiple actors on the other hand may share the same stack thus being much less hungry on memory.
You generally can have 10,000 threads in an application. There are no limits I know of that can stop you.
On the other hand, since few modern desktops have 10,000 processors, this is unlikely to be a good idea.
When you say "actors" are you talking about the actor model? If so, it's an apples-to-oranges comparison; a thread is an actually running path of execution, and an actor is closer to a closure. A thread has allocated resources associated with it (at the very least, for green threads, an instruction pointer location, and more for kernel threads). An actor can be very minimal.
The Actor Model doesn't implicitly scale to millions of Actors. That's a detail of the implementation. For instance, from Scala Actors : A Short Tutorial:
"When actors call thread-blocking operations, such as receive (or even wait), the worker thread that is executing the current actor (self) is blocked. This means basically that the actor is represented as a blocked thread. Depending on the number of actors you want to use, you might want to avoid this, since most JVMs cannot handle more than a few thousand threads on standard hardware."
So it's possible to implement Actors with the same limitation as Threads (or even worse limitations in a pathological implementation).
Likewise, Threads are an abstract concept with no concrete resource requirements. Your 10,000 Thread limit is for a specific implementation (likely kernel-level Windows threads or pthreads) versus Threads in general. In fact, there's research being done to build user-level threads that scale to millions of threads. See:
Message Passing and Actors are a great way to manage concurrency, but they're not the only way. Before you switch to Actors-only, read Rich Hickey's explanation of why he didn't include Actors in Clojure.
Software transactional memory is another alternative to manage mutable shared state.
Actors and Threads are not the same kind of object.
* A thread is an actual object on your machine - it eats a predictable amount of ressources, and has an precise overhead associated with it. Hence, if you describe 1,000,000 tasks by means of giving each of them a thread, you are explicitly asking your machine for 1,000,000 times the resources of one thread.
* An actor is more of a way of explaining to your language a task unit, so it does not have a precise image at run time. Your compiler has much more freedom in using resources as it sees fit to accomplish the task you describe.
Threads are limited in their numbers precisely because they do eat resources in a precisely defined way. They consume memory, and switching to another thread has a cost. Past a certain number of threads, you will exhaust the machine's memory, or the scheduling costs will dominate your program execution. This is the kind of limit you will reach with many threads. And for all practical purposes, on today's hardware 10,000 threads is a lot.
But actors allow you to explain what you want to do in a way that is easier to understand to your programming language. As such, it can manage ressources in a much more efficient way, so that if you describe 1,000,000 tasks with actors, the environment has a much better chance to find a way to compile and run it so that it is still manageable. With respect to threads, most of the difference stems from the fact that when actors are describing a very distinct processing they can be run on any thread, allowing for schemes using an arbitrarily-sized pool of threads to process requests as they arrive. You don't lock the compiler in a scheme that will kill its performance : it knows it can't spawn 1,000,000 threads so it will try to accomplish it another way, and will often succeed. Freedom in deciding the number of threads allows for much optimization, the first of which using as many threads an there are cores/CPU on the machine, which yields optimal computational speed. Such a design is also much more resilient to deadlock problems since you don't deal directly with timing and locking.
As for what threads can do and not actors, there is not much, but threads being a closer representation of what is actualling happening in your machine, you have more close control on what is actually happening. So with threads, it is theoritically possible to achieve more performance, though in actuality using threads to beat a good compiler armed with Actors borderlines on the impossible, because the model is so much harder to use, and because the compiler knows so many things that are about impossible for you to know.
Another thing, albeit not a theoritical limitation, that Threads make possible and that actors won't be very good at, is dealing with libraries that use Threads. Indeed, threads is the base model of many environments and you have many libraries requiring you to use them in a specific manner. Actors may not be very able at that.
I can still think of another thing threads can do and not actors : real-time (I mean hard real time as in microsecond-order deadline, nano-second order jitter and code proofs and hard guarantees). I think it's possible to do real-time using an actor model, but as far as I'm aware, there is no such implementation, so until then you strictly have to do hard real-time with threads.
So why haven't we been using actors from the very beginning ? It's like many good things in programming : it took time and experience understanding better models and implementing them. We have much better tools and languages on average than we used to 30 years ago, and it is the same thing with Actors. It emerged more recently as a better way to do concurrency.