views:

51

answers:

2

Across multiple languages (mostly D and Java/Jython) I've noticed that parallel programs with no obvious synchronization bottleneck often don't scale well to 4 or more cores because of memory management bottlenecks. I'm aware that thread-local allocators mitigate this problem, but most garbage collector implementations still need to stop the world. Garbage collection is not embarrassingly parallel (shared state has to be updated way too often), so using a parallel collector doesn't completely solve the problem. In the case of manual memory management, even if allocations are mostly from a thread-local allocator, the memory still has to be freed, possibly from a different thread than the one it was allocated in.

Is there any language/runtime/malloc implementation for which the memory management bottleneck to SMP parallelism is for all practical purposes a solved problem, while still allowing traditional shared address space multithreading?

A: 

No.

What you describe as the memory management bottleneck is an intrinsic feature, albeit not a desirable one, of SMP computers. Sooner or later the demands of processors for access to memory will overwhelm the memory bus and processing will slow down, or at best stop going faster, with the addition of more processors.

I'm surprised that you generally run into this problem on only 4 cores, in the past I've used SGI Altix machines on which some of my codes would scale well up to and beyond 256 cores. But whether it's your code hitting the bottleneck at 4 cores or mine at 256 or another code at 2048 (if you can find a 2048-core SMP) there will always be a bottleneck.

High Performance Mark
I think you misunderstood the question. I was talking about memory allocation/deallocation, not memory access.
dsimcha
@dsimcha: yes, I often misunderstand questions on SO.
High Performance Mark
A: 

One improvement on an SMP system would be a NUMA aware allocator/collector, which Java already seems to have: http://download-llnw.oracle.com/javase/7/docs/technotes/guides/vm/performance-enhancements-7.html#numa

andras