ansaurus

Question

java cpu-intensive app stalls / hangs when increasing nr of workers. Where is the bottleneck, and how to deduce/ moitor it on linux/ubuntu server

Answer 1

A:

Did jvisualvm give you any useful information?

Thorbjørn Ravn Andersen 2009-12-23 09:11:51

I'll check and report back.. ( I didn't know the tool ;-)

Geert-Jan 2009-12-23 11:06:49

Answer 2

A:

Only profiling will help.

but things to check the worksers get information from a queue, what type of queue is that is the producer queue thread save ? Why use Executors.newScheduledThreadPool to create your workers ? dont you just want them to run immediately ?

Peter 2009-12-23 09:53:09

the message queue is Amazon Simple Queue Service (AWS SQS). Fetching from the queue is non blocking (afaik) Executors.newScheduledThreadPool is used for the purpose of having a a small rampoff (as I believe the English term is) between workers. So intializing of workers is more fluent.

Geert-Jan 2009-12-23 11:06:15

Answer 3

A:

If I understood correctly, multiple workers are all fetching from the same queue, make calculations and hand the result off to their private writers, like:

              / [ worker ] - [ writer, queue ]
[ msg-queue ] - [ worker ] - [ writer, queue ]
              \ [ worker ] - [ writer, queue ]

workers might be blocking to get to the msg queue, adding a reader managing a queue of work items solve this problem if it occurs, like:

                                   / [ worker ] - [ writer, queue ]
[ msg-queue ] - [ fetcher, queue ] - [ worker ] - [ writer, queue ]
                                   \ [ worker ] - [ writer, queue ]

Another thing I pick up from your description is that the calculations make use of a set of collections in a read-only fashion so concurrency should not be a problem. It might be a good idea to investigate which implementation you use, even if you don't synchronise use in your part of the code, collection classes like Vector and Hashtable synchronize by default.

Using immutable versions of collection classes would help to make sure usage of the maps can be concurrent by default.

rsp 2009-12-23 10:05:44

your model is correct. the queue is Amazon Simple Queue Service (SQS) which is designed for such a thing I believe. I will check the java-client implementation (SQS) though just to be sure. As for the collections: standar Java.util + some basis arrays. I will wrap the collections in immutable just to be sure. Thanks

Geert-Jan 2009-12-23 11:37:53

Answer 4

A:

If I were you, I wouldn't put much faith in anybody's guesswork as to what the problem is. I hate to sound like a broken record, but there's a very simple way to find out - stackshots. For example, in your 4-worker case that is running 20 times slower, every time you take a sample of a worker's call stack, the probability is 19/20 that it will be in the hanging state, and you can see why just by examining the stack.

Mike Dunlavey 2009-12-23 14:26:25

Just only now saw your comment. Do you know of a good tool to take / visualize stackshots in a linux server environment?

Geert-Jan 2009-12-28 20:45:52

**pstack** is one such tool. In a case like this you need very few samples - in fact, just **one sample** is almost certain to show you exactly where the problem is.

Mike Dunlavey 2009-12-29 03:10:13

ansaurus

tags:

views:

answers:

java cpu-intensive app stalls / hangs when increasing nr of workers. Where is the bottleneck, and how to deduce/ moitor it on linux/ubuntu server

related questions