views:

2622

answers:

4

Let me first say that I have quite a lot of Java experience, but have only recently become interested in functional languages. Recently I've started looking at Scala, which seems like a very nice language.

However, I've been reading about Scala's Actor framework in Programming in Scala, and there's one thing I don't understand. In chapter 30.4 it says that using react instead of receive makes it possible to re-use threads, which is good for performance, since threads are expensive in the JVM.

Does this mean that, as long as I remember to call react instead of receive, I can start as many Actors as I like? Before discovering Scala, I've been playing with Erlang, and the author of Programming Erlang boasts about spawning over 200,000 processes without breaking a sweat. I'd hate to do that with Java threads. What kind of limits am I looking at in Scala as compared to Erlang (and Java)?

Also, how does this thread re-use work in Scala? Let's assume, for simplicity, that I have only one thread. Will all the actors that I start run sequentially in this thread, or will some sort of task-switching take place? For example, if I start two actors that ping-pong messages to each other, will I risk deadlock if they're started in the same thread?

According to Programming in Scala, writing actors to use react is more difficult than with receive. This sounds plausible, since react doesn't return. However, the book goes on to show how you can put a react inside a loop using Actor.loop. As a result, you get

loop {
    react {
        ...
    }
}

which, to me, seems pretty similar to

while (true) {
    receive {
        ...
    }
}

which is used earlier in the book. Still, the book says that "in practice, programs will need at least a few receive's". So what am I missing here? What can receive do that react cannot, besides return? And why do I care?

Finally, coming to the core of what I don't understand: the book keeps mentioning how using react makes it possible to discard the call stack to re-use the thread. How does that work? Why is it necessary to discard the call stack? And why can the call stack be discarded when a function terminates by throwing an exception (react), but not when it terminates by returning (receive)?

I have the impression that Programming in Scala has been glossing over some of the key issues here, which is a shame, because otherwise it's a truly excellent book.

+21  A: 

First, each actor waiting on receive is occupying a thread. If it never receives anything, that thread will never do anything. An actor on react does not occupy any thread until it receives something. Once it receives something, a thread gets allocated to it, and it is initialized in it.

Now, the initialization part is important. A receiving thread is expected to return something, a reacting thread is not. So the previous stack state at the end of the last react can be, and is, wholly discarded. Not needing to either save or restore the stack state makes the thread faster to start.

There are various performance reasons why you might want one or other. As you know, having too many threads in Java is not a good idea. On the other hand, because you have to attach an actor to a thread before it can react, it is faster to receive a message than react to it. So if you have actors that receive many messages but do very little with it, the additional delay of react might make it too slow for your purposes.

Daniel
+9  A: 

The answer is "yes" - if your actors are not blocking on anything in your code and you are using react, then you can run your "concurrent" program within a single thread (try setting the system property actors.maxPoolSize to find out).

One of the more obvious reasons why it is necessary to discard the call stack is that otherwise the loop method would end in a StackOverflowError. As it is, the framework rather cleverly ends a react by throwing a SuspendActorException, which is caught by the looping code which then runs the react again via the andThen method.

Have a look at the mkBody method in Actor and then the seq method to see how the loop reschedules itself - terribly clever stuff!

oxbow_lakes
+3  A: 

Those statements of "discarding the stack" confused me also for a while and I think I get it now and this is my understanding now. In case of "receive" there is a dedicated thread blocking on the message (using object.wait() on a monitor) and this means that the complete thread stack is available and ready to continue from the point of "waiting" on receiving a message. For example if you had the following code

  def a = 10;
  while (! done)  {
     receive {
        case msg =>  println("MESSAGE RECEIVED: " + msg)
     }
     println("after receive and printing a " + a)
  }

the thread would wait in the receive call until the message is received and then would continue on and print the "after receive and printing a 10" message and with the value of "10" which is in the stack frame before the thread blocked.

In case of react there is no such dedicated thread, the whole method body of the react method is captured as a closure and is executed by some arbitrary thread on the corresponding actor receiving a message. This means only those statements that can be captured as a closure alone will be executed and that's where the return type of "Nothing" comes to play. Consider the following code

  def a = 10;
  while (! done)  {
     react {
        case msg =>  println("MESSAGE RECEIVED: " + msg)
     }
     println("after react and printing a " + a) 
  }

If react had a return type of void, it would mean that it is legal to have statements after the "react" call ( in the example the println statement that prints the message "after react and printing a 10"), but in reality that would never get executed as only the body of the "react" method is captured and sequenced for execution later (on the arrival of a message). Since the contract of react has the return type of "Nothing" there cannot be any statements following react, and there for there is no reason to maintain the stack. In the example above variable "a" would not have to be maintained as the statements after the react calls are not executed at all. Note that all the needed variables by the body of react is already be captured as a closure, so it can execute just fine.

The java actor framework Kilim actually does the stack maintenance by saving the stack which gets unrolled on the react getting a message.

Aswin
Thanks, that was very informative. But didn't you mean `+a` in the code snippets, instead of `+10`?
jqno
+2  A: 

Just to have it here:

Event-Based Programming without Inversion of Control

These papers are linked from the scala api for Actor and provide the theoretical framework for the actor implementation. This includes why react may never return.

Hexren
And the second paper. Bad spam control... :([Actors that Unify Threads and Events][2] [2]: http://lamp.epfl.ch/~phaller/doc/haller07coord.pdf "Actors that Unify Threads and Events"
Hexren