views:

109

answers:

3

I am having an issue with a 32-bit legacy app running on 64-bit windows. The app in question uses CreateFileMapping to create shared memory. When this is run on 64-bit Windows any attempt to access this shared memory from another process takes about 1 second. The shared memory is created using page protection flags:

flProtect = PAGE_READONLY | SEC_NOCACHE | SEC_COMMIT;

when the same memory is created using:

flProtect = PAGE_READONLY | SEC_COMMIT;

the issue disappears. For now this work around is acceptable, but we do have some devices that require the SEC_NOCACHE flag to be set.

Can someone enlighten me on why SEC_NOCACHE would affect performance in this situation?

Update: it seems that only writing to this buffer has increased to 1000ms. Reading does not seem to be affected. We are writing about 5MB to the buffer in this time.

Update2: This software is used on many systems, and one of the systems has a physical device that requires the use of this flags. We are currently limited to running the machine with this device in 32bit windows.

A: 

I would guess, that because the memory has to be remapped from 64-bit to 32-bit, it becomes expensive to provide a 'bounce' buffer. When caching is enabled, this bounce buffer is implicit and the OS may circumvent the need to continously update the memory section.

Christopher
This is what I am leaning towards, but 1000ms seems to be exorbitant amount of time for these operations.
Justin
Bounce buffers are only needed for DMA hardware, which needs a 32-bit *physical* address. In 32-bit applications in user mode, there is translation done by the MMU and any address in memory, even above 4GB, can be mapped to a 32-bit virtual address.
Ben Voigt
@ben: even without bounce buffers can the mapping from 64 to 32 bit add this kind of overhead?
Justin
No, the MMU is used in exactly the same manner on a 32-bit OS.
Ben Voigt
+1  A: 

You are disabling the file system cache with that flag. Yes, that makes an enormous difference, it forces the OS to work with the disk driver and read sectors directly. Cylinders cannot be read and cached, disabling the optimization that makes reading tracks without having to move the read head so cheap. And lazy write-back is disabled, an optimization that makes disk writes appear instantaneous.

Hans Passant
This is a memory mapped file, there is no hard disk access involved.
Justin
There is. If you specify INVALID_HANDLE_VALUE for CreateFileMapping's hFile argument then it is backed-up by the paging file. Check the SDK docs for it.
Hans Passant
Why would this change write performance from 32 to 64 bit. If this was a big issue would I not see the performance hit in 32bit as well?
Justin
32-bit code is running in an emulation layer called Wow64. Implementation details are *very* hard to come by, this isn't documented anywhere. I suspect that 64-bit drivers need to jump through hoops to deal with pages that were allocated in the emulation layer.
Hans Passant
I have selected this as the answer because I believe there is an issue in the wow64 layer.
Justin
+3  A: 

Here's what Microsoft has to say about that flag:

The SEC_NOCACHE flag is intended for architectures that require various locking structures to be located in memory that is not ever fetched into the CPU cache. On x86 and MIPS machines, use of this flag just slows down the performance because the hardware keeps the cache coherent.

Unfortunately they don't quantify the amount of slow down.

Mark Ransom
This is true, however in 32-bit this slow down is less than 10ms. In 64 it is in the range of 1000ms.
Justin
@Justin: are both of the processes 32bit? At least on *NIXes there is similar situation with shared memory created by a 32bit process, accessed from a 64 bit process and vice versa. And BTW if the cited article has any truth behind it, then throw away the flag: all multi-CPU/core systems Windows runs now are explicitly cache-coherent (unlike the first SMP systems decade+ ago WinNT 3.5 ran on).
Dummy00001
@Dummy: both processes are 32bit. Thanks for the multi-core point, I will look into that. There is only one piece of hardware we use that needs this flag and it is used on a 32bit system. Currently we check if the os is 32 or 64 and set the flags. I am just curious about the root cause of this slow down.
Justin