views:

142

answers:

4

I know how to atomically write a value in x86 ASM. But how do I read one? The LOCK prefix can't be used with mov.

To increase a value, I am doing:

lock inc dword ptr Counter

How do I read Counter in a thread-safe way?

+2  A: 

I'm not an assembly expert, but word-sized (on x86, 32-bit) reads/writes should be atomic already.

The reason you need to lock the increment is because that's both a read AND a write.

Mike Caron
I think you're right. That makes sense. Thanks.
IanC
Don't forget to accept an answer if it helped you ;)
Mike Caron
Not always! If memory address is in cache which use second CPU in multy CPU unit the reading isn't guaranteed to be atomic. So use "LOCK CMPXCHG EAX, [var]" which first fence memory cache.
GJ
@GJ: I think this only applies to misaligned data - normally you would not have misaligned data so it shouldn't be an issue ?
Paul R
I know the read wouldn't be atomic, but it would still be a snapshot, which means the value should be correct, surely?Even if you had 2 CPUs and their cache was being synchronized, I don't think LOCK is going to play any part in ensuring the value is the latest before the var is read... or will it?
IanC
@Paul R: Not in the case if two thread running at the some time each under own CPU and accessing the some memory address. In that case the cache sinhronisation is needed. Some instructions like "LOCK CMPXCHG" do this automaticly. Instructions like MOV need first memory fence instruction to sinhronise cached memory. Check: Intel® 64 and IA-32 Architectures Software Developer’s Manuals. I have added links in to my answer.
GJ
A: 

For a simple read, it's mostly about alignment. The easiest way to assure atomic reading is to always use "natural" alignment -- i.e., the alignment is as least as great as the size of the item (e.g., 32-bit item is 32-bit aligned).

Misaligned reads aren't necessarily atomic. For an extreme example, consider reading a 32-bit value at an odd address where the first byte is in one cache line, and the other three bytes are in another cache line. In such a case, an atomic read is essentially impossible.

Since (at least most) processors use a 64-bit wide memory bus, the largest item that can hope to be read atomically is 64 bits.

Jerry Coffin
Some SSE instructions under x86 support 128bit atomic load/store, of course you must first ensure memory aligment. Look movdqa instruction!
GJ
+3  A: 

As I explain to you in this post:

Accesses to cacheable memory that are split across bus widths, cache lines, and page boundaries are not guaranteed to be atomic by the Intel Core 2 Duo, Intel Core Duo, Pentium M, Pentium 4, Intel Xeon, P6 family, Pentium, and Intel486 processors. The Intel Core 2 Duo, Intel Core Duo, Pentium M, Pentium 4, Intel Xeon, and P6 family processors provide bus control signals that permit external memory subsystems to make split accesses atomic; however, nonaligned data accesses will seriously impact the performance of the processor and should be avoided.

So use:

LOCK        CMPXCHG   EAX, [J]

LOCK CMPXCHG first fence cache memory and than compare the EAX with destination value, if destination value not equ then the result in EAX is destination value.

EDIT: LINKs to:

Intel® 64 and IA-32 Architectures Software Developer’s Manuals

In Volume 3A: System Programming Guide check section 8.1.1

Also check: Optimization Reference Manual section: CHAPTER 7 OPTIMIZING CACHE USAGE

GJ
That won't compile since [J] is a memory pointer. It has to be a register value. This is the catch-22 I can't get around.
IanC
I see from your other post that this actually isn't an issue so long as the value is aligned and withing the CPU's bus width.
IanC
+1  A: 

It is interesting to read the other replies. I think @GJ is probably on the money.

For many years it was always true that 32-bit read and write was atomic. It is only in recent years with the really aggressive caching that this is no longer guaranteed.

I guess that's why I prefer C++, Java or some such between me and the machine code. These days the machine code is too complex to write reliably (unless you do it a lot to keep your skills sharp). Luckily, today's optimising compilers are so good that you seldom need the performance of hand-optimised assembler.

Michael J