views:

473

answers:

3

I am confused that Microsoft says memory alignment is required for InterlockedExchange however, Intel documentation says that memory alignment is not required for LOCK. Am i missing something, or whatever? thanks

from Microsoft MSDN Library

Platform SDK: DLLs, Processes, and Threads InterlockedExchange

The variable pointed to by the Target parameter must be aligned on a 32-bit boundary; otherwise, this function will behave unpredictably on multiprocessor x86 systems and any non-x86 systems.

from Intel Software Developer’s Manual;

  • LOCK instruction Causes the processor’s LOCK# signal to be asserted during execution of the accompanying instruction (turns the instruction into an atomic instruction). In a multiprocessor environment, the LOCK# signal insures that the processor has exclusive use of any shared memory while the signal is asserted.

    The integrity of the LOCK prefix is not affected by the alignment of the memory field. Memory locking is observed for arbitrarily misaligned fields.

  • Memory Ordering in P6 and More Recent Processor Families

    Locked instructions have a total order.

  • Software Controlled Bus Locking

    The integrity of a bus lock is not affected by the alignment of the memory field. The LOCK semantics are followed for as many bus cycles as necessary to update the entire operand. However, it is recommend that locked accesses be aligned on their natural boundaries for better system performance: •Any boundary for an 8-bit access (locked or otherwise). •16-bit boundary for locked word accesses. •32-bit boundary for locked doubleword accesses. •64-bit boundary for locked quadword accesses.

A: 

Even though the lock prefix doesn't require memory to be aligned, and the cmpxchg operation that's probably used to implement InterlockedExchange() doesn't require alignment, if the OS has enabled alignment checking then the cmpxchg will raise an alignment check exception (AC) when executed with unaligned operands. Check the docs for the cmpxchg and similar, looking at the list of protected mode exceptions. I don't know for sure that Windows enables alignment checking, but it wouldn't surprise me.

csgordon
+1  A: 

Hey, I answered a few questions related to this, also keep in mind;

  1. There is NO byte level InterlockedExchange there IS a 16 bit short InterlockedExchange however.
  2. The documentation discrepency you refer, is probably just some documentation oversight.
  3. If you want todo Byte/Bit level atomic access, there ARE pleanty of ways todo this with the existing intrinsics, Interlocked[And8|Or8|Xor8]
  4. Any operation where your doing high-perf locking (using the machiene code like you discuss), should not be operating un-aligned (performance anti-pattern)
  5. xchg (optimized instruction with implicit LOCK prefix, optimized due to ability to cache lock and avoid a full bus lock to main memory). CAN do 8bit interlocked operations.

I nearly forgot, from Intel's TBB, they have Load/Store 8bit's defined w/o the use of implicit or explicit locking (in some cases);

.code 
    ALIGN 4
    PUBLIC c __TBB_machine_load8
__TBB_machine_Load8:
    ; If location is on stack, compiler may have failed to align it correctly, so we do dynamic check.
    mov ecx,4[esp]
    test ecx,7
    jne load_slow
    ; Load within a cache line
    sub esp,12
    fild qword ptr [ecx]
    fistp qword ptr [esp]
    mov eax,[esp]
    mov edx,4[esp]
    add esp,12
    ret

EXTRN __TBB_machine_store8_slow:PROC
.code 
    ALIGN 4
    PUBLIC c __TBB_machine_store8
__TBB_machine_Store8:
    ; If location is on stack, compiler may have failed to align it correctly, so we do dynamic check.
    mov ecx,4[esp]
    test ecx,7
    jne __TBB_machine_store8_slow ;; tail call to tbb_misc.cpp
    fild qword ptr 8[esp]
    fistp qword ptr [ecx]
    ret
end

Anyhow, hope that clears at leat some of this up for you.

RandomNickName42
+4  A: 

Once upon a time, Microsoft supported WindowsNT on processors other than x86, such as MIPS, PowerPC, and Alpha. These processors all require alignment for their interlocked instructions, so Microsoft put the requirement in their spec to ensure that these primitives would be portable to different architectures.

Chris Dodd
Also x64 mode requires alignment on interlocked operations
Rom