views:

174

answers:

3

Hi I'm trying to test my JAVA app on Solaris Sparc and I'm getting some weird behavior. I'm not looking for flame wars. I just curious to know what is is happening or what is wrong...

I'm running the same JAR on Intel and on the T1000 and while on the Windows machine I'm able to get 100% (Performance monitor) cpu utilisation on the Solaris machine I can only get 25% (prstat)

The application is a custom server app I wrote that uses netty as the network framework.

On the Windows machine I'm able to reach just above 200 requests/responses a second including full business logic and access to outside 3rd parties while on the Solaris machine I get about 150 requests/responses at only 25% CPU

One could only imagine how many more requests/responses I could get out of the Sparc if I can make it uses full power.

The servers are...

Windows 2003 SP2 x64bit, 8GB, 2.39Ghz Intel 4 core Solaris 10.5 64bit, 8GB, 1Ghz 6 core

Both using jdk 1.6u21 respectively.

Any ideas?

+2  A: 

The T1000 uses a multi-core CPU, which means that the CPU can run multiple threads simultaneously. If the CPU is at 100% utilization, it means that all cores are running at 100%. If your application uses less threads than the number of cores, then your application cannot use all the cores, and therefore cannot use 100% of the CPU.

Erick Robertson
His post seems to indicate that the Windows server has a 4-core CPU and the Solaris server has a 6-core CPU. If that's the case, the application must be capable of utilizing at least four cores.
Chris Shouts
The T1000 has hyperthreading, too. It claims to be able to handle perhaps 8 tasks per core if I'm reading right. This being the case, if his application used 12 threads, it would run the Windows server at 100% and the Solaris at 25%.
Erick Robertson
Yes the Windows machine is an Intel 4 core
Just something that stood out. You commented in the-alchemist's answer that you configure 2 * cpu cores for "worker/selector" threads? Try changing that to 8 * cpu cores. The T1000 docs I read said that it could handle 8 tasks per core, and this strangely comes out to the 25% CPU usage that you reported.
Erick Robertson
But what about the other 300 handler threads?
It depends on which threads are bottlenecking the application. If those handler threads all require one of the worker/selector threads to manage them, you could very likely see this behavior. Make sure that the worker/selector threads are properly handing the work off to the handler and then getting another request to hand off to another handler, etc. Make sure this isn't blocking until the handler thread finishes its job.
Erick Robertson
Well thats netty so I'll give it befit of the doubt. And if it was the case then Windows wouldn't go to 100%. So I tried 48 worker/selector threads and same result.
+1  A: 

Without any code, it's hard to help out. Some ideas:

  • Profile the Java app on both systems, and see where the difference is. You might be surprised. Because the T1 CPU lacks out-of-order execution, you might see performance lacking in strange areas.
  • As Erick Robertson says, try bumping up the number of threads to the number of virtual cores reported via prstat, NOT the number of regular cores. The T1000 uses UltraSparc T1 processors, which make heavy use of thread-level parallelism.

Also, note that you're using the latest-gen Intel processors and old Sun ones. I highly recommend reading Developing and Tuning Applications on UltraSPARC T1 Chip Multithreading Systems and Maximizing Application Performance on Chip Multithreading (CMT) Architectures, both by Sun.

The Alchemist
At idle state the application uses 37 threads. In working state it uses 337.Basically, Netty configures 2 * cpu cores for "worker/selector" threads and I have it configured to use 300 "handler" threadsSo that is 8 threads + 300 + what ever else JVM and hibernate and what ever are using...
Hmm... in that case, I would highly recommend profiling to see where the bottlenecks are. If Netty is the bottleneck, you'll just have to play with different parameters to eek a little performance out.If the bottleneck is in your code, you'll have a lot more flexibility.Keep in mind that CPU usage in `prstat` doesn't take into account IO wait time, which in web apps can be pretty substantial.In short, I recommend profiling and seeing what's taking up so much time.
The Alchemist
Anyways I looked at the documents provided above. So far I can't really tell based alone on the descriptions. As when I check the stats it seems that all cpu threads are being used. Anyways the best would be to talk to an Oracle engineer, but since this is the only machine I have and was purchased through a 3rd party vendor I have no support package. So I guess I leave it a that for now.
@user432024: Sorry to hear about that. Having developed multi-threaded software for the T2 chips, I can say that performance tuning can be a bit difficult. It's a CMT, no out-of-order execution chip. Taking advantage of all those cores can be difficult, but well worth it in the end. Once you get all cores blasting away at 100%, it flies.
The Alchemist
A: 

I think number of cores do not matter as the java multithreading cannot harness the capability of multiple core AFAIK. Had it been running JDK 1.7 beta and have used the new fork-join threading to utilize the multicore processor, then the scenario would have been diffrent. But because T1000 has hyperthreading capabilities (Thanks to "the-alchemist" for the link), the OS Solris which has been specificaly written for that hardware can really make a difference.

Amit
Why would you think that Java's multithreading cannot use multiple cores?
Gabe
I read it here: http://www.ibm.com/developerworks/java/library/j-jtp11137.html
Amit
@Gabe - Also, can you give a code example to show me how would you use java multithreading to utilize multiple cores? It would be a great learning for me.
Amit
Amit: Any multithreading you do in Java will make use of multiple cores. Nothing in the article you link to says otherwise. All it says is that current Java libraries make it hard to write code that uses all of your cores, not that it's impossible.
Gabe
@Amit: That's a very broad reaching and inaccurate statement. It's the *web application* that has to be written to take advantage of multiple cores. The VM has to expose facilities to allow other applications to write multi-core apps, such as threads (or fork-join).
The Alchemist