What is meant by using an explicit memory fence?
In my experience it refers to a memory barrier, which is an instruction (explicit or implicit) to synchronize memory access between multiple threads.
The problem occurs in the combination of modern agressive compilers (they have amazing freedom to reorder instructions, but usually know nothing of your threads) and modern multicore CPUs.
A good introduction to the problem is the "The 'Double-Checked Locking is Broken' Declaration". For many, it was the wake-up call that there be draygons.
Implicit full memory barriers are usually included in platform thread synchronization routines, which cover the core of it. However, for lock-free programming and implementing custom, lightweight synchronization patterns, you often need just the barrier, or even a one-way barrier only.
For performance gains modern CPUs often execute instructions out of order to make maximum use of the available silicon (including memory read/writes). Because the hardware enforces instruction integrity you never notice this in a single thread of execution. However for multiple threads or environments with volitile memory (memory mapped I/O for example) this can lead to unpredictable behaviour.
A memory fence/barrier is a class of instructions that mean memory read/writes occur in the order you expect. For example a 'full fence' means all read/writes before the fence are comitted before those after the fence.
Note memory fences are a hardware concept. In higher level languages we are used to dealing with mutexes and semaphores - these may well be implemented using memory fences at the low level and explicit use of memory barriers are not necessary. Use of memory barriers requires a careful study of the hardware architecture and more commonly found in device drivers than application code.
The CPU reordering is different from compiler optimisations - although the artefacts can be similar. You need to take separate measures to stop the compiler reordering your instructions if that may cause undesirable behaviour (e.g. use of the volatile keyword in C).
Memory barrier, also known as membar or memory fence, is a class of instructions which cause a central processing unit (CPU) to enforce an ordering constraint on memory operations issued before and after the barrier instruction.
CPUs employ performance optimizations that can result in out-of-order execution, including memory load and store operations. Memory operation reordering normally goes unnoticed within a single thread of execution, but causes unpredictable behaviour in concurrent programs and device drivers unless carefully controlled. The exact nature of an ordering constraint is hardware dependent, and defined by the architecture's memory model. Some architectures provide multiple barriers for enforcing different ordering constraints.
Memory barriers are typically used when implementing low-level machine code that operates on memory shared by multiple devices. Such code includes synchronization primitives and lock-free data structures on multiprocessor systems, and device drivers that communicate with computer hardware.
The memory fence is all the stuff beyond which that only my privileged class is allowed to ponder. ;)
Copying my answer to another question, What are some tricks that a processor does to optimize code?:
The most important one would be memory access reordering.
Absent memory fences or serializing instructions, the processor is free to reorder memory accesses. Some processor architectures have restrictions on how much they can reorder; Alpha is known for being the weakest (i.e., the one which can reorder the most).
A very good treatment of the subject can be found in the Linux kernel source documentation, at Documentation/memory-barriers.txt.
Most of the time, it's best to use locking primitives from your compiler or standard library; these are well tested, should have all the necessary memory barriers in place, and are probably quite optimized (optimizing locking primitives is tricky; even the experts can get them wrong sometimes).