views:

107

answers:

4

How well optimized is Java's parallel collecting GC for multithreaded environments? I've written some multithreaded Jython code that spends most of its time calling Java libraries. Depending on which options I run the program with, the library calls either do tons of allocations under the hood or virtually none. When I use the options that require tons of heap allocations, I can't get the code to scale past 6 cores. When I use the options that don't require lots of allocations, it scales to at least 20. How likely is it that this is related to a GC bottleneck, given that I'm using the stock Sun VM, the parallel GC and Jython as my glue language?

Edit: Just to clarify, I won't necessarily think of stuff that's obvious to Java veterans because I almost never use Java/JVM languages. I do most of my programming in D and the flagship CPython implementation of Python. I'm using the JVM and Jython for a small one-off project b/c I need access to a Java library.

+3  A: 

Since your question is about GC bottlenecks: you can eliminate that possibility by turning on GC logging and checking the logs - if there are large number of GC events with large pauses you can confirm/discount this theory. (However, in the scenario you describe, i would guess it is not a GC issue).

tholomew
Thanks, I had no idea that getting access to a human-readable log of what the GC was doing was so easy until you pointed me to it and I googled it. It's definitely not that easy in D and AFAIK not that easy in CPython. The logs definitely clarify one thing: GC is running several times per **second**. I'm amazed the code scales even as well as it does.
dsimcha
+1  A: 

The Java GC is generational. A collection of first generation is meant to take care of short-lived objects and is expected to run frequently. Running for a short interval several times per second is the expected behaviour if there are many short-lived allocations. (This should be a comment rather than an answer - I have no rep, sorry).

Also, depending on which VM you are using, you can choose between GC algorithms. The options will vary depending on which version and vendor of the VM you are using.

Some (old) info is here: http://java.sun.com/developer/technicalArticles/Programming/turbo/#The_new_GC

Burleigh Bear
+1  A: 

To me, problems with GC and multithreading are very real. I'm not saying the JVM is bad, it's just that the problem itself is very hard to deal with.

In one of our project, we had two applications running in a single JVM (app. server). When stressing them individually that was fine, but when both were stress together performance degraded in strange way. We finally split the apps. in two JVMs, and performance went back to normal (of course slower than when only one app was use, but reasonable).

Tuning the GC is extremely hard. Things can improve for 5 minutes, and then a major collection will block, etc. You might decide whether you want high throughput or low latency in the operations. High throughput is fine for batch processing, low latency is necessary for interactive application. Ultimately, the default parameters of the JVM were for us the one giving the best results!

That's not really an answer, rather a return on experience, but yes, to me GC and multi threading might be an issue.

ewernli
I'm aware that GC/multithreading issues exist, but Java's GC is highly refined and benefits so much from the strictness of Java as a language (no unions, raw pointers, etc.) that I thought these were mostly solved problems. In D, the problems are even bigger, but since I generally write more "from scratch" code, as opposed to glue code, in D and can do more dirty tricks to avoid unnecessary allocations, they're also easier to work around. Working around them in the current project would basically mean rewriting a large library, I think.
dsimcha
A: 

Threading performance can vary from one jdk version to another. In my experience, on jdk6u18, the parallel gc, enabled with -XX:+UseParallelGC (not the concurrent mark sweep gc), performs very well on a quad core with hundreds of very active threads. I consider it very unlikely that it would not scale beyond 6 cores.

The fact that Sun's hardware is based on processors with a high number of cores explains why they've put a lot of effort in new garbage collectors in recent years.

The parallel gc is not enabled by default because its single threaded performance is not as good as the default gc.

jvdneste