views:

791

answers:

9

I'm exploring the possibility of running a Java app on a machine with very large amounts of RAM (anywhere from 300GB to 15TB, probably on an SGI Altix 4700 machine), and I'm curious as to how Java's GC is likely to perform in this scenario.

I've heard that IBM's or JRockit's JVMs may be better suited to this than Sun's. Does anyone know of any research or data on JVM performance in this situation?

A: 

Surely the answer as to how the GC's going to perform is "who cares?" ;-)

Will Dean
Heh, well, for example it would be problematic if it caused the machine to freeze for several hours without warning while it did a full GC.
sanity
It seems obvious that the author of the question cares. Duh.
Guge
I think Will was just humorously saying "GC is unlikely to even occur." That's probably true for the stuff I run; no clue about the questioner. :)
skiphoppy
At home i have 15TB RAM (I play games) and simple Hallo World works ok.
geeeeeeeeeek
@geeeeeeeeeek: What kind of games do you play with 15TB RAM? SimUniverse, maybe?
Michael Myers
+3  A: 

The question is: do you want to run within a single process (JVM) or not? If you do, then you're going to have a problem. Refer to Tuning Java Virtual Machines, Oracle Coherence User Guide and similar documentation. The rule of thumb I've operated by is try and avoid heaps larger than 1GB. Whereas a 512MB-1GB full GC might take less than a second. A 2-4GB full GC could potentially take 5 seconds or longer. Obvioiusly this depends on many factors but the moral of the story is that GC overhead does not scale linearly and once you get into the one second range performance then degrades rapidly.

cletus
Surely GC is done in a separate thread, isn't it?
Paul Tomblin
Paul, I believe most GC is done in a separate thread, but occasionally a full GC is required and it blocks the application. I'm not 100% though.
sanity
Paul: With Java 5, GC runs in a separate thread but there are situations when it has to move "special" data (like the stack). In this case, it will block the VM. Java 6 is better but it still blocks sometimes.
Aaron Digulla
+2  A: 

This is not at all answering your question, but if you plan do deploy a huge Java app you might be interested in looking into Azul Systems appliances. They say to be able to garbage-collect without creating a pause in the application up to a single 670 GB heap.

Vinko Vrsalovic
One difference is that Azul is designed for Java, it doesn't even have a C compiler!
Peter Lawrey
+3  A: 

Sun's JVM allows you to configure and optimize the heck out of garbage collection, but it's a science unto itself: http://java.sun.com/javase/technologies/hotspot/gc/gc_tuning_6.html

You might have to do some reading and research, but for that kind of machine, GC settings optimized for the machine and application probably make a big difference.

Michael Borgwardt
+3  A: 

Since 5.0 the Hotspot JVM uses a concept know as Ergonomics to try to optimise the memory usage. This is based on more than just the sheer amount of memory available and effects heap sizes, generation sizes and garbage collection algorithms.

Start by having a read of this, which explains Ergonomics and more:

http://java.sun.com/j2se/reference/whitepapers/memorymanagement_whitepaper.pdf

There's also a guy called Brian Goetz that's written numerous articles about how Java allocates and uses memory, all of which and more can be found here:

http://www.briangoetz.com/pubs.html

Nick Holt
+4  A: 

On the Sun JVM, you can use the option -XX:UseConcMarkSweepGC to turn on the Concurrent mark and sweep Collector, which will avoid the "stop the world" phases of the default GC algorithm almost completely, at the cost of a little bit more overhead.

The advise to use more than on VM on such a machine is IMHO outdated. In real world applications you often have enough shared data so that the performance with the CMS and one JVM is better.

kohlerm
There are many valid reasons to use multiple JVMs than this and using only one will really restrict your ability to maintain uptime yet cycle in changes and deal with faults.
cletus
Agreed. But the question was about "performance"
kohlerm
A: 

There are some additional answers in previous responses to a similar question

Steve B.
A: 

The only people who can really tell you are SGI. Super computers don't behave like regular servers only bigger.

However, I have found that Java performs best when memory is local to the processors accessing it. Note: the GC needs to be able to walk the whole memory end to end. This means it doesn't scale well if you have a design which is like lots of computers stuck together which may be the case here. The memory module size is 32 GB, so you may get better performance if you limit your JVM to comfortably fit into this size.

Peter Lawrey
+1  A: 

You might want to consider running a virtual Terracotta cluster on this machine.

John Nilsson