views:

103

answers:

2

There is something that bugs me with the Java memory model (if i even understand everything correctly). If there are two threads A and B, there are no guarantees that B will ever see a value written by A, unless both A and B synchronize on the same monitor.

For any system architecture that guarantees cache coherency between threads, there is no problem. But if the architecture does not support cache coherency in hardware, this essentially means that whenever a thread enters a monitor, all memory changes made before must be commited to main memory, and the cache must be invalidated. And it needs to be the entire data cache, not just a few lines, since the monitor has no information which variables in memory it guards. But that would surely impact performance of any application that needs to synchronize frequently (especially things like job queues with short running jobs). So can Java work reasonably well on architectures without hardware cache-coherency? If not, why doesn't the memory model make stronger guarantees about visibility? Wouldn't it be more efficient if the language would require information what is guarded by a monitor?

As i see it the memory model gives us the worst of both worlds, the absolute need to synchronize, even if cache coherency is guaranteed in hardware, and on the other hand bad performance on incoherent architectures (full cache flushes). So shouldn't it be more strict (require information what is guarded by a monitor) or more lose and restrict potential platforms to cache-coherent architectures?

As it is now, it doesn't make too much sense to me. Can somebody clear up why this specific memory model was choosen?


EDIT: My use of strict and lose was a bad choice in retrospect. I used "strict" for the case where less guarantees are made and "lose" for the opposite. To avoid confusion, its probably better to speak in terms of stronger or weaker guarantees.

+2  A: 

the absolute need to synchronize, even if cache coherency is guaranteed in hardware

Yes, but then you only have to reason against the Java Memory Model, not against a particular hardware architecture that your program happens to run on. Plus, it's not only about the hardware, the compiler and JIT themselves might reorder the instructions causing visibility issue. Synchronization constructs in Java addresses visibility & atomicity consistently at all possible levels of code transformation (e.g. compiler/JIT/CPU/cache).

and on the other hand bad performance on incoherent architectures (full cache flushes)

Maybe I misunderstood s/t, but with incoherent architectures, you have to synchronize critical sections anyway. Otherwise, you'll run into all sort of race conditions due to the reordering. I don't see why the Java Memory Model makes the matter any worse.

shouldn't it be more strict (require information what is guarded by a monitor)

I don't think it's possible to tell the CPU to flush any particular part of the cache at all. The best the compiler can do is emitting memory fences and let the CPU decides which parts of the cache need flushing - it's still more coarse-grained than what you're looking for I suppose. Even if more fine-grained control is possible, I think it would make concurrent programming even more difficult (it's difficult enough already).

AFAIK, the Java 5 MM (just like the .NET CLR MM) is more "strict" than memory models of common architectures like x86 and IA64. Therefore, it makes the reasoning about it relatively simpler. Yet, it obviously shouldn't offer s/t closer to sequential consistency because that would hurt performance significantly as fewer compiler/JIT/CPU/cache optimizations could be applied.

Buu Nguyen
I havent put any thought on reordering issues, but if a reordering boundary is needed, just accessing a volatile field would provide one without any need for synchronization. If the memory model included guarantees about visibility of non-volatile fields, a single volatile field could be used to communicate the validity of any number of non-volatile fields between two threads (although at the expense of polling that one volatile field).
Durandal
Regarding incoherent architectures, for a volatile field the JIT can use instructions that bypass the cache OR if those are absent just flush the single cache line which covers the field. With the semantics of synchronized there is no restriction of which field(s) are to be published to another thread, hence i concluded that the entire cache needs to be flushed in that case. Thats what bugs me. Flushing the entire cache (potentially megabytes of dirty cache), even if just a few fields need to be published to another thread seems inefficent (of course it depends highly on the application)
Durandal
I have to admit that i have only intimate knowledge of one processor family and that one includes instructions to flush/invalidate single lines, as well as a memory page (as in MMU page). Since this is a performance relevant aspect for device drivers that perform DMA, i would assume that any reasonable architecture has a way to accomplish this. But i might be completely wrong with this assumption.
Durandal
A: 

the caches that JVM has access to are really just CPU registers. since there aren't many of them, flushing them upon monitor exit isn't a big deal.

EDIT: (in general) the memory caches are not under the control of JVM, JVM cannot choose to read/write/flush these caches, so forget about them in this discussion

imagine each CPU has 1,000,000 registers. JVM happily exploits them to do crazy fast computations - until it bumps into monitor enter/exit, and has to flush 1,000,000 registers to the next cache layer.

if we live in that world, either Java must be smart enough to analyze what objects aren't shared (majority of objects aren't), or it must ask programmers to do that.

java memory model is a simplified programming model that allows average programmers make OK multithreading algorithms. by 'simplified' I mean there might be 12 people in the entire world who really read chapter 17 of JLS and actually understood it.

irreputable
I don't think this is correct. The CPU also has memory caches in the order of megabytes which front main memory http://en.wikipedia.org/wiki/CPU_cache. The issue is when this cache flushes its dirty pages to main memory and invalidates the pages that have been updated in main memory.
Gray
The semantics of cache and register are different - registers are thread local, while the cache may be shared between threads (or possibly even processes, depending on architecture).
Durandal
dudes, the memory caches you talked about are not programmable by JVM, so they are irrelevant to our discussion.
irreputable