ansaurus

Question

Why doesn't this code demonstrate the non-atomicity of reads/writes?

Answer 1

+3 A:

John Kugelman 2010-09-09 18:10:39

@John Kugelman: I can see that the `KeepReading` method sees different values of `_d` in the debugger. Also, looking at the IL, the first line *inside* the loop is `ldsfld float64 Tester.Program::d`, so there is no compiler-optimization going on.

Ani 2010-09-09 18:16:31

The debugger is not Schrodinger-safe. You're trying to test something very low level. This is getting a bit beyond my pay grade, but I suspect the JIT optimizer could optimize the reads away at runtime. It's hard to say, it could be #2, #3, or #4 as well.

John Kugelman 2010-09-09 18:21:20

Answer 2

+2 A:

Peter Johansson 2010-09-09 18:19:39

As an aside, you might try using a construct other than 'Assert.'

Peter Johansson 2010-09-09 18:24:18

@Peter Johansson: In release mode, I replaced it with `if(...) throw`.

Ani 2010-09-09 18:39:47

Removing dCopy does nothing (except make the thread read _d twice instead of once).

Henk Holterman 2010-09-09 18:41:38

Answer 3

+7 A:

You might try running it through CHESS to see if it can force an interleaving that breaks the test.

If you take a look at the x86 diassembly (visible from the debugger), you might also see if the jitter is generating instructions that preserve atomicity.

EDIT: I went ahead and ran the disassembly (forcing target x86). The relevant lines are:

                double dCopy = _d;
00000039  fld         qword ptr ds:[00511650h] 
0000003f  fstp        qword ptr [ebp-40h]

                _d = rand.Next(2) == 0 ? 0D : double.MaxValue;
00000054  mov         ecx,dword ptr [ebp-3Ch] 
00000057  mov         edx,2 
0000005c  mov         eax,dword ptr [ecx] 
0000005e  mov         eax,dword ptr [eax+28h] 
00000061  call        dword ptr [eax+1Ch] 
00000064  mov         dword ptr [ebp-48h],eax 
00000067  cmp         dword ptr [ebp-48h],0 
0000006b  je          00000079 
0000006d  nop 
0000006e  fld         qword ptr ds:[002423D8h] 
00000074  fstp        qword ptr [ebp-50h] 
00000077  jmp         0000007E 
00000079  fldz 
0000007b  fstp        qword ptr [ebp-50h] 
0000007e  fld         qword ptr [ebp-50h] 
00000081  fstp        qword ptr ds:[00159E78h]

It uses a single fstp qword ptr to perform the write operation in both cases. My guess is that the Intel CPU guarantees atomicity of this operation, though I haven't found any documentation to support this. Any x86 gurus who can confirm this?

UPDATE:

This fails as expected if you use Int64, which uses the 32-bit registers on the x86 CPU rather than the special FPU registers. You can see this below:

                Int64 dCopy = _d;
00000042  mov         eax,dword ptr ds:[001A9E78h] 
00000047  mov         edx,dword ptr ds:[001A9E7Ch] 
0000004d  mov         dword ptr [ebp-40h],eax 
00000050  mov         dword ptr [ebp-3Ch],edx

UPDATE:

I was curious if this would fail if I forced non-8byte alignment of the double field in memory, so I put together this code:

    [StructLayout(LayoutKind.Explicit)]
    private struct Test
    {
        [FieldOffset(0)]
        public double _d1;

        [FieldOffset(4)]
        public double _d2;
    }

    private static Test _test;

    [STAThread]
    static void Main()
    {
        new Thread(KeepMutating).Start();
        KeepReading();
    }

    private static void KeepReading()
    {
        while (true)
        {
            double dummy = _test._d1;
            double dCopy = _test._d2;

            // In release: if (...) throw ...
            Debug.Assert(dCopy == 0D || dCopy == double.MaxValue); // Never fails
        }
    }

    private static void KeepMutating()
    {
        Random rand = new Random();
        while (true)
        {
            _test._d2 = rand.Next(2) == 0 ? 0D : double.MaxValue;
        }
    }

It does not fail and the generated x86 instructions are essentially the same as before:

                double dummy = _test._d1;
0000003e  mov         eax,dword ptr ds:[03A75B20h] 
00000043  fld         qword ptr [eax+4] 
00000046  fstp        qword ptr [ebp-40h] 
                double dCopy = _test._d2;
00000049  mov         eax,dword ptr ds:[03A75B20h] 
0000004e  fld         qword ptr [eax+8] 
00000051  fstp        qword ptr [ebp-48h]

I experimented with swapping _d1 and _d2 for usage with dCopy/set and also tried a FieldOffset of 2. All generated the same basic instructions (with different offsets above) and all did not fail after several seconds (likely billions of attempts). I'm cautiously confident, given these results, that at least the Intel x86 CPUs provide atomicity of double load/store operations, regardless of alignment.

Dan Bryant 2010-09-09 18:21:30

The value can change between compares. First it is MaxValue and doesn't equal 0. Then it is 0 and doesn't equal MaxValue.

Dan Bryant 2010-09-09 19:58:54

Answer 4

A:

IMO the correct answer is #5.

double is 8 bytes long.

Memory interface is 64 bits = 8 bytes per module per clock (i.e. it becomes 16 bytes for double-channel memory).

There're also CPU caches. On my machine, the cache line is 64 bytes, and on all CPUs it's multiple of 8.

As said by the comments above, even when the CPU is running in 32-bits mode, double variables are loaded and stored with just 1 instruction.

That's why as long as your double variable is aligned (I suspect the common language runtime virtual machine does alignment for you), the double reads and writes are atomic.

Soonts 2010-09-09 19:46:09

ansaurus

tags:

views:

answers:

Why doesn't this code demonstrate the non-atomicity of reads/writes?

related questions