views:

3262

answers:

4

I have some old java code for a REST service that uses a separate thread for every incoming request. I.e. the main loop would loop on socket.accept() and hand off the socket to a Runnable which then would start up its own background thread and invoke run on itself. This worked admiringly well for a while until recently i noticed that the lag of accept to processing the request would get unacceptable under high load. When i say admiringly well, i mean that it was handling 100-200 requests a second without significant CPU usage. The performance only degraded when other daemons were adding load as well and then only once load exceeded 5. When the machine was under high load (5-8) from a combination of other processes, the time from accept to processing would get ridiculously high (500ms to 3000ms) while the actual processing stayed sub-10ms. This is all on dual-core centos 5 systems.

Having been used to Threadpools on .NET, i assumed that thread creation was the culprit and i thought i'd apply the same pattern in java. Now my Runnable is executed with ThreadPool.Executor (and the pool uses and ArrayBlockingQueue). Again, it works great under most scenarios unless the machine load gets high, then the time from creating the runnable until run() is invoked exhibits about the same ridiculous timing. But worse, the system load nearly doubled (10-16) with the threadpool logic in place. So now i get the same latency problems with double the load.

My suspicion is that the lock contention of the queue is worse than the previous new thread start-up cost that had no locks. Can anyone share their experience of new thread vs. threadpool. And if my suspicion is correct, anyone have an alternative approach to dealing with a threadpool without lock contention?

I'd be tempted to just make the whole system single-threaded since i don't know how much my threading helps and IO doesn't seem to be an issue, but I do get some requests that are long-lived that would then block everything.

thanks, arne

UPDATE: I switched over to Executors.newFixedThreadPool(100); and while it maintained the same processing capacity, load pretty much immediately doubled and running it for 12 hours showed load staying consistently at 2x. I guess in my case a new thread per request is cheaper.

+3  A: 

measurement, measurement, measurement! Where is it spending the time? What has to happen when you create your Runnable? Does the Runnable have anything that could be blocking or delaying in the instantiation? What's happening during that delay?

I'm actually a big believer in thinking things through in general, but this sort of case, with unexpected behavior like this, just has to have some measurements.

What is the runtime environment, JVM version, and architecture?

Charlie Martin
The Runnable does nothing in the constructor other than record the current time (for measurement) and assigning arguments to fields.Measured delta from creation to run() being called. new Runnable()-> new Thread().start()->run() now new Runnable()->pool.exec()->run(). Intel, Centos 5, java 1.6.0_07
Arne Claassen
A: 

I just did this with some of my own code. I used the Netbeans profiler to turn the thread pool implementation that I was using. You should be able to do the same thing with the Visual VM but I have not tried that yet.

TofuBeer
+1  A: 

Sun's implementation of Thread, although very much faster than it used to be, does have locking. IIRC, ArrayBlockingQueue should not actually lock at all when busy. Therefore it's profiler time (or even just a few ctrl-\s or jstacks).

System load just tells you how many threads are queued. It's not necessarily very useful.

Tom Hawtin - tackline
+6  A: 

With the configuration of:

new ThreadPoolExecutor(10, 100, 30, TimeUnit.SECONDS, 
        new ArrayBlockingQueue<Runnable>(100))

Then once 10 threads are concurrently processing requests, further requests are added to the queue, unless it reaches 100 requests in the queue, at which time it will start creating new threads, unless there are already 100 threads, when the processing of the command will be rejected.

The section of the javadocs of ThreadPoolExecutor (copied below) may be worth another read.

Based on them, and your apparent willingness to have 100 threads running, and your desire to accept all requests, processing them eventually.. I'd recommend trying variations like:

new ThreadPoolExecutor(100, 100, 0, TimeUnit.SECONDS, 
        new LinkedBlockingQueue<Runnable>())

Which, incidentally, is what you'd get from Executors.newFixedThreadPool(100);


Queuing

Any BlockingQueue may be used to transfer and hold submitted tasks. The use of this queue interacts with pool sizing:

  • If fewer than corePoolSize threads are running, the Executor always prefers adding a new thread rather than queuing.
  • If corePoolSize or more threads are running, the Executor always prefers queuing a request rather than adding a new thread.
  • If a request cannot be queued, a new thread is created unless this would exceed maximumPoolSize, in which case, the task will be rejected.

There are three general strategies for queuing:

  1. Direct handoffs. A good default choice for a work queue is a SynchronousQueue that hands off tasks to threads without otherwise holding them. Here, an attempt to queue a task will fail if no threads are immediately available to run it, so a new thread will be constructed. This policy avoids lockups when handling sets of requests that might have internal dependencies. Direct handoffs generally require unbounded maximumPoolSizes to avoid rejection of new submitted tasks. This in turn admits the possibility of unbounded thread growth when commands continue to arrive on average faster than they can be processed.
  2. Unbounded queues. Using an unbounded queue (for example a LinkedBlockingQueue without a predefined capacity) will cause new tasks to wait in the queue when all corePoolSize threads are busy. Thus, no more than corePoolSize threads will ever be created. (And the value of the maximumPoolSize therefore doesn't have any effect.) This may be appropriate when each task is completely independent of others, so tasks cannot affect each others execution; for example, in a web page server. While this style of queuing can be useful in smoothing out transient bursts of requests, it admits the possibility of unbounded work queue growth when commands continue to arrive on average faster than they can be processed.
  3. Bounded queues. A bounded queue (for example, an ArrayBlockingQueue) helps prevent resource exhaustion when used with finite maximumPoolSizes, but can be more difficult to tune and control. Queue sizes and maximum pool sizes may be traded off for each other: Using large queues and small pools minimizes CPU usage, OS resources, and context-switching overhead, but can lead to artificially low throughput. If tasks frequently block (for example if they are I/O bound), a system may be able to schedule time for more threads than you otherwise allow. Use of small queues generally requires larger pool sizes, which keeps CPUs busier but may encounter unacceptable scheduling overhead, which also decreases throughput.
Stephen Denne