views:

1598

answers:

5

What is meant by using an explicit memory fence?

+3  A: 

In my experience it refers to a memory barrier, which is an instruction (explicit or implicit) to synchronize memory access between multiple threads.

The problem occurs in the combination of modern agressive compilers (they have amazing freedom to reorder instructions, but usually know nothing of your threads) and modern multicore CPUs.

A good introduction to the problem is the "The 'Double-Checked Locking is Broken' Declaration". For many, it was the wake-up call that there be draygons.

Implicit full memory barriers are usually included in platform thread synchronization routines, which cover the core of it. However, for lock-free programming and implementing custom, lightweight synchronization patterns, you often need just the barrier, or even a one-way barrier only.

peterchen
+13  A: 

For performance gains modern CPUs often execute instructions out of order to make maximum use of the available silicon (including memory read/writes). Because the hardware enforces instruction integrity you never notice this in a single thread of execution. However for multiple threads or environments with volitile memory (memory mapped I/O for example) this can lead to unpredictable behaviour.

A memory fence/barrier is a class of instructions that mean memory read/writes occur in the order you expect. For example a 'full fence' means all read/writes before the fence are comitted before those after the fence.

Note memory fences are a hardware concept. In higher level languages we are used to dealing with mutexes and semaphores - these may well be implemented using memory fences at the low level and explicit use of memory barriers are not necessary. Use of memory barriers requires a careful study of the hardware architecture and more commonly found in device drivers than application code.

The CPU reordering is different from compiler optimisations - although the artefacts can be similar. You need to take separate measures to stop the compiler reordering your instructions if that may cause undesirable behaviour (e.g. use of the volatile keyword in C).

Gwaredd
I do not think volatile is enough to stop the compiler reordering; AFAIK it only makes sure the compiler cannot cache the variable value. The Linux kernel uses a gcc extension (__asm__ __volatile__("": : :"memory")) to create a full compiler optimization barrier.
CesarB
yes, volatile doesnt care about threads.
Johannes Schaub - litb
true, volatile isn't thread aware but you can use it to stop the compiler applying certain optimisations - this is unrelated to fences ;)
Gwaredd
(.NET CLR) volatile reads are acquire fences, writes are release fences. Interlocked ops are full as is the MemoryBarrier method.
Luke Puplett
Interesting read about the volatile keyword in .net can be found here http://www.albahari.com/threading/part4.aspx#_NonBlockingSynch The site contains a lot of useful information on threading in c#
Bas Smit
A: 

Wikipedia knows all...

Memory barrier, also known as membar or memory fence, is a class of instructions which cause a central processing unit (CPU) to enforce an ordering constraint on memory operations issued before and after the barrier instruction.

CPUs employ performance optimizations that can result in out-of-order execution, including memory load and store operations. Memory operation reordering normally goes unnoticed within a single thread of execution, but causes unpredictable behaviour in concurrent programs and device drivers unless carefully controlled. The exact nature of an ordering constraint is hardware dependent, and defined by the architecture's memory model. Some architectures provide multiple barriers for enforcing different ordering constraints.

Memory barriers are typically used when implementing low-level machine code that operates on memory shared by multiple devices. Such code includes synchronization primitives and lock-free data structures on multiprocessor systems, and device drivers that communicate with computer hardware.

Omar Kooheji
lol @ Wikipedia knows all
epochwolf
A: 

The memory fence is all the stuff beyond which that only my privileged class is allowed to ponder. ;)

kenny
+1  A: 

Copying my answer to another question, What are some tricks that a processor does to optimize code?:

The most important one would be memory access reordering.

Absent memory fences or serializing instructions, the processor is free to reorder memory accesses. Some processor architectures have restrictions on how much they can reorder; Alpha is known for being the weakest (i.e., the one which can reorder the most).

A very good treatment of the subject can be found in the Linux kernel source documentation, at Documentation/memory-barriers.txt.

Most of the time, it's best to use locking primitives from your compiler or standard library; these are well tested, should have all the necessary memory barriers in place, and are probably quite optimized (optimizing locking primitives is tricky; even the experts can get them wrong sometimes).

CesarB