views:

298

answers:

3

I'm just learning, and really liking, the Actor pattern. I'm using Scala right now, but I'm interested in the architectural style in general, as it's used in Scala, Erlang, Groovy, etc.

The case I'm thinking of is where I need to do things concurrently, such as, let's say "run a job".

With threading, I would create a thread pool and a blocking queue, and have each thread poll the blocking queue, and process jobs as they came in and out of the queue.

With actors, what's the best way to handle this? Does it make sense to create a pool of actors, and somehow send messages to them containing or the jobs? Maybe with a "coordinator" actor?

Note: An aspect of the case which I forgot to mention was: what if I want to constrain the number of jobs my app will process concurrently? Maybe with a config setting? I was thinking that a pool might make it easy to do this.

Thanks!

+5  A: 

A pool is a mechanism you use when the cost of creating and tearing down a resource is high. In Erlang this is not the case so you should not maintain a pool.

You should spawn processes as you need them and destroy them when you have finished with them.

Gordon Guthrie
Thanks, but what if I want to constrain the number of jobs my app will process concurrently? Maybe with a config setting? I was thinking that a pool makes it easy to do this.
Avi Flax
@Avi: I think you need to make a distinction here. A "pool" usually refers to (at least for me) actually preserving the actors/processes and reusing them. In Erlang you don't need to do that, you can just throw them away and spawn new ones. Of course you can implement a "global counter" (in the form of a server process, in an ets table, etc.), which you can poll before spawning another job process. In fact you do need some facility like this to achieve load control...
Zed
@Zed: Good points, thanks. Maybe I should have asked "What's a good way to limit concurrency with actors?"
Avi Flax
You might be able to use Actor.mailboxSize to get the kind of constraint you're looking for
marc esher
This approach also seems to be a way to achieve what you're seeking: http://stackoverflow.com/questions/1007010/can-scala-actors-process-multiple-messages-simultaneously
marc esher
+3  A: 

Sometimes, it makes sense to limit how many working processes you have operating concurrently on a large task list, as the task the process is spawned to complete involve resource allocations. At the very least processes use up memory, but they could also keep open files and/or sockets which tend to be limited to only thousands and fail miserably and unpredictable once you run out.

To have a pull-driven task pool, one can spawn N linked processes that ask for a task, and one hand them a function they can spawn_monitor. As soon as the monitored process has ended, they come back for the next task. Specific needs drive the details, but that is the outline of one approach.

The reason I would let each task spawn a new process is that processes do have some state and it is nice to start off a clean slate. It's a common fine-tuning to set the min-heap size of processes adjusted to minimize the number of GCs needed during its lifetime. It is also a very efficient garbage collection to free all memory for a process and start on a new one for the next task.

Does it feel weird to use twice the number of processes like that? It's a feeling you need to overcome in Erlang programming.

Christian
Very interesting, thanks! I might mark this as the "accepted" answer, I gotta think this over.
Avi Flax
+2  A: 

There is no best way for all cases. The decision depends on the number, duration, arrival, and required completion time of the jobs.

The most obvious difference between just spawning off actors, and using pools is that in the former case your jobs will be finished nearly at the same time, while in the latter case completion times will be spread in time. The average completion time will be the same though.

The advantage of using actors is the simplicity on coding, as it requires no extra handling. The trade-off is that your actors will be competing for your CPU cores. You will not be able to have more parallel jobs than CPU cores (or HT's, whatever), no matter what programming paradigm you use.

As an example, imagine that you need to execute 100'000 jobs, each taking one minute, and the results are due next month. You have four cores. Would you spawn off 100'000 actors having each compete over the resources for a month, or would you just queue your jobs up, and have execute four at a time?

As a counterexample, imagine a web server running on the same machine. If you have five requests, would you prefer to serve four users in T time, and one in 2T, or serve all five in 1.2T time ?

Zed