Use XADD or MOV instruction instead ADD instruction!
See also MFENCE, LFENCE and SFENCE instructions!
EDIT:
You can't use LOCK instruction with ADD instruction if source operand is a memory operand!
From: "Intel® 64 and IA-32 ArchitecturesSoftware Developer’s Manual"
The LOCK prefix can be prepended only
to the following instructions and only
to those forms of the instructions
where the destination operand is a
memory operand: ADD, ADC, AND, BTC,
BTR, BTS, CMPXCHG, CMPXCH8B, DEC, INC,
NEG, NOT, OR, SBB, SUB, XOR, XADD, and
XCHG. If the LOCK prefix is used with
one of these instructions and the
source operand is a memory operand, an
undefined opcode exception (#UD) may
be generated. An undefined opcode
exception will also be generated if
the LOCK prefix is used with any
instruction not in the above list. The
XCHG instruction always asserts the
LOCK# signal regardless of the
presence or absence of the LOCK prefix
EDIT2:
Form: "Intel® 64 and IA-32 ArchitecturesSoftware Developer’s Manual, Volume3A"
8.1.1 Guaranteed Atomic Operations.
The Intel486 processor (and newer
processors since) guarantees that the
following basic memory operations will
always be carried out atomically:
- Reading or writing a byte
- Reading or writing a word aligned
on a 16-bit boundary
- Reading or writing a doubleword aligned on a 32-bit boundary
The Pentium processor (and newer
processors since) guarantees that the
following additional memory operations
will always be carried out atomically:
- Reading or writing a quadword aligned on a 64-bit boundary
- 6-bit accesses to uncached memory locations that fit within a 32-bit
data bus The P6 family processors
(and newer processors since)
guarantee that the following
additional memory operation will
always be carried out atomically:
- Unaligned 16-, 32-, and 64-bit accesses to cached memory that fit
within a cache line
Accesses to cacheable memory that are
split across bus widths, cache lines,
and page boundaries are not guaranteed
to be atomic by the Intel Core 2 Duo,
Intel Core Duo, Pentium M, Pentium 4,
Intel Xeon, P6 family, Pentium, and
Intel486 processors. The Intel Core 2
Duo, Intel Core Duo, Pentium M,
Pentium 4, Intel Xeon, and P6 family
processors provide bus control signals
that permit external memory subsystems
to make split accesses atomic;
however, nonaligned data accesses will
seriously impact the performance of
the processor and should be avoided.
So, for reading I prefare to use CMPXCHG instruction with LOCK prefix, like:
LOCK CMPXCHG EAX, [J]
For writting:
MOV [J], EAX
SFENSE
.