How well optimized is Java's parallel collecting GC for multithreaded environments? I've written some multithreaded Jython code that spends most of its time calling Java libraries. Depending on which options I run the program with, the library calls either do tons of allocations under the hood or virtually none. When I use the options that require tons of heap allocations, I can't get the code to scale past 6 cores. When I use the options that don't require lots of allocations, it scales to at least 20. How likely is it that this is related to a GC bottleneck, given that I'm using the stock Sun VM, the parallel GC and Jython as my glue language?
Edit: Just to clarify, I won't necessarily think of stuff that's obvious to Java veterans because I almost never use Java/JVM languages. I do most of my programming in D and the flagship CPython implementation of Python. I'm using the JVM and Jython for a small one-off project b/c I need access to a Java library.