views:

213

answers:

10
+2  Q: 

jvm on multicore

I've read a blog post a while ago claiming a Java application ran better when it was allowed to utilize a single cpu in a multicore machine: http://mailinator.blogspot.com/2010/02/how-i-sped-up-my-server-by-factor-of-6.html

What reasons could there be for a Java application, running on multicore machines to run much slower than on a single core machine?

+1  A: 

I doubt the "Much" part.

My guess would be that the expense of moving state from one cpu to another is high enough to be noticeable. Generally you want jobs to stay on the same cpu so its data is cached as much as possible locally.

Thorbjørn Ravn Andersen
+1  A: 

This is entirely speculation without the article/data in question, but there are some types of programs which are not well suited for parallelization - perhaps the application is never CPU-bound (meaning the CPU is not the bottleneck, perhaps some sort of I/O is).

However this question/conversation is pretty baseless without more details.

matt b
+1  A: 

There is no Java-specific reason for this, but moving state from core to core or even from CPU to CPU takes time. This time can be used better if the process stays on a single core. Also, caching can be improved in such cases.

This is only relevant though if the program does not utilize multiple threads and can thus distribute its work on to multiple cores/CPUs effectively.

Thomas Lötzer
A: 

Recent Intel CPUs have Turbo Boost:

http://en.wikipedia.org/wiki/Intel_Turbo_Boost

matiasf
That would, at worst, mean that a truly absurd (if not malicious) task scheduler could try to arrange things so that Turbo Boost doesn't kick in -- and I'm not even sure it could actually be manipulated that way. In any case it would never amount to a a 6x performance difference.
Nicholas Knight
I agree. Turbo Boost doesn't even come close to 6x.
DeadMG
A: 

This will be depend on the number of threads the application spawns. If you spawn say four worker-threads doing heavy number-crunching, the app will be almost four times faster on a quad-core machine, depending on how much book-keeping and merging you must do.

johanbev
+5  A: 

If there is significant contention among shared resources in the different threads, it could be that locking and unlocking objects requires a large amount of IPI (inter-processor interrupts) and the processors may spend more time discarding their L1 and L2 caches and re-fetching data from other CPUs than they actually spend making progress on solving the problem at hand.

This can be a problem if the application has way too-fine-grained locking. (I once heard it summed up "there is no point having more than one lock per CPU cache line", which is definitely true, and perhaps still too fine-grained.)

Java's "every object is a mutex" could lead to having too many locks in the running system if too many are live and contended.

I have no doubt someone could intentionally write such an application, but it probably isn't very common. Most developers would write their applications to reduce resource contention where they can.

sarnold
+1  A: 

The application could make very poor use of blocking inter-thread communication. However, this would purely be down to the fact that the application is programmed exceptionally poorly.

There is no reason at all why any even mediocre-ly programmed multi-core application with a moderately parallelisable workload should run slower on multiple cores.

DeadMG
+1  A: 

From a pure performance perspective, the challenge is often around the memory subsystem. So while more CPUs is often good, having CPUs that aren't near the memory that the Java objects are sitting in is very, very expensive. It is VERY machine specific, and depends greatly on the exact path between each CPU and memory. Both Intel and AMD have had various shapes / speeds here, and the results vary greatly.

See NUMA for reasons why multi-core might hinder.

We have seen performance deltas in the 30% range or more depending on how JVMs are pinned to processors. SPECjbb2005 is now mostly run in "multi-JVM" mode with each JVM associated with a given CPU / memory for this reason.

Trent Gray-Donald
A: 

CPU often have a limit to how much heat they can produce. This means a chip with less core can run at a high frequency which can result in a program running faster if it doesn't use the extra core effectively. Today the difference is between 4, 6 and 8 core, where more cores are individually slower. I don't know of any single core systems which are faster than the fastest 4 core system.

Peter Lawrey
In the article the author made his server faster by allocating it to just a single CPU instead of 6
IttayD
You are right that he is allocating all his processes to one core. If this works, he almost certainly has a tuning problem, though what it is is unclear.
Peter Lawrey
A: 

The JIT will not include memory barriers if it thinks its running in a single core. I suspect that is what is happening in the referenced article.

Here is a very concise explanation of memory barriers, it also provides a neat technique of seeing the JIT'd code: http://www.infoq.com/articles/memory_barriers_jvm_concurrency

This isn't to say all applications would benefit from being placed on a single core.

reccles