views:

401

answers:

3

I'm coming from Java, where I'd submit Runnables to an ExecutorService backed by a thread pool. It's very clear in Java how to set limits to the size of the thread pool.

I'm interested in using Scala actors, but I'm unclear on how to limit concurrency.

Let's just say, hypothetically, that I'm creating a web service which accepts "jobs". A job is submitted with POST requests, and I want my service to enqueue the job then immediately return 202 Accepted — i.e. the jobs are handled asynchronously.

If I'm using actors to process the jobs in the queue, how can I limit the number of simultaneous jobs that are processed?

I can think of a few different ways to approach this; I'm wondering if there's a community best practice, or at least, some clearly established approaches that are somewhat standard in the Scala world.

One approach I've thought of is having a single coordinator actor which would manage the job queue and the job-processing actors; I suppose it could use a simple int field to track how many jobs are currently being processed. I'm sure there'd be some gotchyas with that approach, however, such as making sure to track when an error occurs so as to decrement the number. That's why I'm wondering if Scala already provides a simpler or more encapsulated approach to this.

BTW I tried to ask this question a while ago but I asked it badly.

Thanks!

+5  A: 

You can override the system properties actors.maxPoolSize and actors.corePoolSize which limit the size of the actor thread pool and then throw as many jobs at the pool as your actors can handle. Why do you think you need to throttle your reactions?

oxbow_lakes
Very useful, thanks!I'm not sure I'd use the term _throttle_, but either way, there are times where one needs to constrain the number of simultaneous "processes" because the work they do is resource-intensive.
Avi Flax
This approach might not yield the desired result. It will allow jobs to be queued up until the JVM runs out of memory. Limiting the number of threads that actors can use will just limit the number of jobs that are actually executed concurrently. I've produced OOM errors by generating work faster than the actors can do it before, so you have to be careful.
Erik Engbrecht
I'm thinking one downside of this approach is that it's global. Sometimes I have different types of processes I need to run which have different levels of resource utilization — with Java thread pools, I can easily use different pools with different settings. With `actors.maxPoolSize`, I can only use a single number for all actors, because they're all powered by the same thread pool, right?
Avi Flax
+5  A: 

I'd really encourage you to have a look at Akka, an alternative Actor implementation for Scala.

http://www.akkasource.org

Akka already has a JAX-RS[1] integration and you could use that in concert with a LoadBalancer[2] to throttle how many actions can be done in parallell:

[1] http://doc.akkasource.org/rest [2] http://github.com/jboner/akka/blob/master/akka-patterns/src/main/scala/Patterns.scala

Viktor Klang
Interesting, I'll check it out, thanks!
Avi Flax
+2  A: 

You really have two problems here.

The first is keeping the thread pool used by actors under control. That can be done by setting the system property actors.maxPoolSize.

The second is runaway growth in the number of tasks that have been submitted to the pool. You may or may not be concerned with this one, however it is fully possible to trigger failure conditions such as out of memory errors and in some cases potentially more subtle problems by generating too many tasks too fast.

Each worker thread maintains a dequeue of tasks. The dequeue is implemented as an array that the worker thread will dynamically enlarge up to some maximum size. In 2.7.x the queue can grow itself quite large and I've seen that trigger out of memory errors when combined with lots of concurrent threads. The max dequeue size is smaller 2.8. The dequeue can also fill up.

Addressing this problem requires you control how many tasks you generate, which probably means some sort of coordinator as you've outlined. I've encountered this problem when the actors that initiate a kind of data processing pipeline are much faster than ones later in the pipeline. In order control the process I usually have the actors later in the chain ping back actors earlier in the chain every X messages, and have the ones earlier in the chain stop after X messages and wait for the ping back. You could also do it with a more centralized coordinator.

Erik Engbrecht