views:

38

answers:

1

Our application is:

  1. Hardware configuration is a dual Xeon server running Windows 7/64bit. Each Xeon has it's own 12gb RAM in a [NUMA][1] configuration with a bridge connecting two memory regions together.
  2. All software is written using VS2008 in c++ and compiled as 64 bit applications.
  3. A Generation app creates a large shared memory (4-6gb) region that is only going to be accessed by processes set with a processor affinity to run on the first Xeon processor.
  4. A Receiving app creates a large shared memory region (2-4gb) that is primarily used by processes set with a processor affinity to run on the second Xeon processor. However, when the Generation App completes building one set of data (32mb to 128mb), it transfer that information to the shared memory region running on this Xeon.
  5. We are using Boost Interprocess library to manage our shared memory regions.

My question is, when each of the processes that creates it's shared memory region, does windows allocate that memory on the same Xeon chip that created it? Or should I explicitly assign the memory to a particular Xeon chip using one of the Numa memory functions?


EDIT - to help clarify what NUMA is, from Wikipedia -

Non-Uniform Memory Access or Non-Uniform Memory Architecture (NUMA) is a computer memory design used in multiprocessors, where the memory access time depends on the memory location relative to a processor. Under NUMA, a processor can access its own local memory faster than non-local memory, that is, memory local to another processor or memory shared between processors.between processors.

Link is http://en.wikipedia.org/wiki/Non-Uniform_Memory_Access for more details. To me, it is one of those things that we are all going to have to learn more about as multiprocessing becomes more common.

[1]: http://msdn.microsoft.com/en-us/library/aa363804%28VS.85%29.aspxbetween processors.

A: 

Windows will allocate memory local to the requesting thread; however, local is not specified by Microsoft. Local could be one of three options: the thread's ideal processor, the thread's processor affinity mask, or the thread's current processor (I forget what the current implementation is).

In essence, the answer is yes; however, a common gotcha is allocating all memory from a "controller thread" that isn't affinitized, and thus the memory is near to the controller and not the threads with specific affinity.

Brian
While researching an answer to a different question, I found a presentation at WinHEC stating that Windows now uses IdealProcessor.http://download.microsoft.com/download/a/f/d/afdfd50d-6eb9-425e-84e1-b4085a80e34e/SVR-T331_WH07.pptx
Brian