Joe Duffy, gives 6 rules that describe the CLR 2.0+ memory model (it's actual implementation, not any ECMA standard) I'm writing down my attempt at figuring this out, mostly as a way of rubber ducking, but if I make a mistake in my logic, at least someone here will be able to catch it before it causes me grief.
- Rule 1: Data dependence among loads and stores is never violated.
- Rule 2: All stores have release semantics, i.e. no load or store may move after one.
- Rule 3: All volatile loads are acquire, i.e. no load or store may move before one.
- Rule 4: No loads and stores may ever cross a full-barrier (e.g. Thread.MemoryBarrier, lock acquire, Interlocked.Exchange, Interlocked.CompareExchange, etc.).
- Rule 5: Loads and stores to the heap may never be introduced.
- Rule 6: Loads and stores may only be deleted when coalescing adjacent loads and stores from/to the same location.
I'm attempting to understand these rules.
x = y
y = 0 // Cannot move before the previous line according to Rule 1.
x = y
z = 0
// equates to this sequence of loads and stores before possible re-ordering
load y
store x
load 0
store z
Looking at this, it appears that the load 0 can be moved up to before load y, but the stores may not be re-ordered at all. Therefore, if a thread sees z == 0, then it also will see x == y.
If y was volatile, then load 0 could not move before load y, otherwise it may. Volatile stores don't seem to have any special properties, no stores can be re-ordered with respect to each other (which is a very strong guarantee!)
Full barriers are like a line in the sand which loads and stores can not be moved over.
No idea what rule 5 means.
I guess rule 6 means if you do:
x = y
x = z
Then it is possible for the CLR to delete both the load to y and the first store to x.
x = y
z = y
// equates to this sequence of loads and stores before possible re-ordering
load y
store x
load y
store z
// could be re-ordered like this
load y
load y
store x
store z
// rule 6 applied means this is possible?
load y
store x // but don't pop y from stack (or first duplicate item on top of stack)
store z
What if y was volatile? I don't see anything in the rules that prohibits the above optimization from being carried out. This does not violate double-checked locking, because the lock() between the two identical conditions prevents the loads from being moved into adjacent positions, and according to rule 6, that's the only time they can be eliminated.
So I think I understand all but rule 5, here. Anyone want to enlighten me (or correct me or add something to any of the above?)