I know how to atomically write a value in x86 ASM. But how do I read one? The LOCK prefix can't be used with mov.
To increase a value, I am doing:
lock inc dword ptr Counter
How do I read Counter in a thread-safe way?
I know how to atomically write a value in x86 ASM. But how do I read one? The LOCK prefix can't be used with mov.
To increase a value, I am doing:
lock inc dword ptr Counter
How do I read Counter in a thread-safe way?
I'm not an assembly expert, but word-sized (on x86, 32-bit) reads/writes should be atomic already.
The reason you need to lock the increment is because that's both a read AND a write.
For a simple read, it's mostly about alignment. The easiest way to assure atomic reading is to always use "natural" alignment -- i.e., the alignment is as least as great as the size of the item (e.g., 32-bit item is 32-bit aligned).
Misaligned reads aren't necessarily atomic. For an extreme example, consider reading a 32-bit value at an odd address where the first byte is in one cache line, and the other three bytes are in another cache line. In such a case, an atomic read is essentially impossible.
Since (at least most) processors use a 64-bit wide memory bus, the largest item that can hope to be read atomically is 64 bits.
As I explain to you in this post:
Accesses to cacheable memory that are split across bus widths, cache lines, and page boundaries are not guaranteed to be atomic by the Intel Core 2 Duo, Intel Core Duo, Pentium M, Pentium 4, Intel Xeon, P6 family, Pentium, and Intel486 processors. The Intel Core 2 Duo, Intel Core Duo, Pentium M, Pentium 4, Intel Xeon, and P6 family processors provide bus control signals that permit external memory subsystems to make split accesses atomic; however, nonaligned data accesses will seriously impact the performance of the processor and should be avoided.
So use:
LOCK CMPXCHG EAX, [J]
LOCK CMPXCHG first fence cache memory and than compare the EAX with destination value, if destination value not equ then the result in EAX is destination value.
EDIT: LINKs to:
Intel® 64 and IA-32 Architectures Software Developer’s Manuals
In Volume 3A: System Programming Guide check section 8.1.1
Also check: Optimization Reference Manual section: CHAPTER 7 OPTIMIZING CACHE USAGE
It is interesting to read the other replies. I think @GJ is probably on the money.
For many years it was always true that 32-bit read and write was atomic. It is only in recent years with the really aggressive caching that this is no longer guaranteed.
I guess that's why I prefer C++, Java or some such between me and the machine code. These days the machine code is too complex to write reliably (unless you do it a lot to keep your skills sharp). Luckily, today's optimising compilers are so good that you seldom need the performance of hand-optimised assembler.