views:

33

answers:

1

In our application we are running on a dual Xeon server with memory configured as 12gb local to each processor and a memory bus connecting the two Xeon's. For performance reasons, we want to control where we allocate a large (>6gb) block of memory. Below is simplified code -

DWORD processorNumber = GetCurrentProcessorNumber();
UCHAR   nodeNumber = 255;
GetNumaProcessorNode((UCHAR)processorNumber, &nodeNumber );
// get amount of physical memory available of node.
ULONGLONG availableMemory = MAXLONGLONG;
GetNumaAvailableMemoryNode(nodeNumber, &availableMemory )
// make sure that we don't request too much.  Initial limit will be 75% of available memory
_allocateAmt = qMin(requestedMemory, availableMemory * 3 / 4);
// allocate the cached memory region now.
HANDLE handle = (HANDLE)GetCurrentProcess ();
cacheObject = (char*) VirtualAllocExNuma (handle, 0, _allocateAmt, 
            MEM_COMMIT | MEM_RESERVE ,
            PAGE_READWRITE| PAGE_NOCACHE , nodeNumber);

The code as is, works correctly using VS2008 on Win 7/64.

In our application this block of memory functions as a cache store for static objects (1-2mb ea) that are normally stored on the hard drive. My problem is that when we transfer data into the cache area using memcpy, it takes > 10 times as longer than if we allocate memory using new char[xxxx]. And no other code changes.

We are at a loss to understand why this is happening. Any suggestions as to where to look?

+3  A: 

PAGE_NOCACHE is murder on perf, it disables the CPU cache. Was that intentional?

Hans Passant
No. That is not what I intended. I had thought that disabled disk caching of memory block, not CPU. That did solve most of my performance issues. Thanks.
photo_tom