views:

106

answers:

2

I asked this question a few weeks ago, but I'm still having the problem and I have some new hints. The original question is here:

http://stackoverflow.com/questions/1651887/java-random-slowdowns-on-mac-os

Basically, I have a java application that splits a job into independent pieces and runs them in separate threads. The threads have no synchronization or shared memory items. The only resources they do share are data files on the hard disk, with each thread having an open file channel.

Most of the time it runs very fast, but occasionally it will run very slow for no apparent reason. If I attach a CPU profiler to it, then it will start running quickly again. If I take a CPU snapshot, it says its spending most of its time in "self time" in a function that doesn't do anything except check a few (unshared unsynchronized) booleans. I don't know how this could be accurate because 1, it makes no sense, and 2, attaching the profiler seems to knock the threads out of whatever mode they're in and fix the problem. Also, regardless of whether it runs fast or slow, it always finishes and gives the same output, and it never dips in total cpu usage (in this case ~1500%), implying that the threads aren't getting blocked.

I have tried different garbage collectors, different sizings the parts of the memory space, writing data output to non-raid drives, and putting all data output in threads separate the main worker threads.

Does anyone have any idea what kind of problem this could be? Could it be the operating system (OS X 10.6.2) ? I have not been able to duplicate it on a windows machine, but I don't have one with a similar hardware configuration.

A: 

How do you know it's running slow? How do you know that it runs quicker when CPU profiler is active? If you do the entire run under the profiler does it ever run slow? If you restrict the number of threads to one does it ever run slow?

djna
I have a print out that occasionally tells me my messages processed per second. Generally it is in the 300,000 range, but when it runs slowly it is ~15,000. When I hook the cpu profiler up it keeps running slowly because of the profiler overhead, but when I disconnect the profiler it starts running fast. So I can fix a slow running process by just attaching and detaching the profiler.I have never been able to reproduce it with one thread, but it doesn't occur with any regularity. Usually I kick off a batch of 100 runs, and I come check on it later and the run it's on is going slow.
javajustice
I haven't reproduced it while it's running in a profiler (like netbeans or visualvm) because I can't tell if it's running slow b/c of the profiler or because of the bug. It has occurred when I've run with -Xprof though.
javajustice
A: 

Actually this is an interesting problem, im curious to know whats the problem.

  • First, in your previous question, you are saying you split the job between "multiple" processors. Are they physically multiple, like in multiple machines? or a multi core CPU?

  • Second, im not sure if Snow Leopard has something to do with it, but we know that SL introduced few new features in term of multi-processor machines. So there might be some problem with the VM on the new OS. Try to use another Java version, i know SL uses Java 6 by default. Try to use Java 5.

  • Third, did you try to make the Thread pool a little smaller, you are talking about 100 threads running at same time. Try to make them 20 or 40 for example. See if it makes difference.

  • Finally, i would be interested in seeing how you implemented the multi-threading solution. Small parts of the code will be good

medopal
Yeah it's 2 multicore cpus, 2 of the new Nehalem Xeons with hyperthreading. So I have 8 physical cores, 16 logical cores with hyperthreading. So I split the process into 15 threads only, leaving one out so the machine remains responsive for other things. I will try using the older JDK and let you know.
javajustice
The threading model is very simple because the problem is easily partitionable. All I do is spawn 15 completely independent threads, with no synchronization or shared data (beyond some static constants), start them, and then join each of them.
javajustice
What has me puzzled is how it can run slowly but still use the same amount of cpu, and still get the same answer. CPU profling results make no sense, like I mentioned, showing a huge pct of time spent checking some booleans in a non-synchronized method.
javajustice
when creating 15 threads you are not actually using 15 cores and leaving one free! its the OS responsibility to distribute the threads between the cores. Using Activity Monitor, you can see how many threads the application is taking, it will usually be 5-6 more than your own threads.
medopal
Yeah it's a few more, but those threads aren't doing anything. There are some extra error handling threads, some shutdown hooks, etc, that are blocked the entire time.
javajustice