numa

Do you anticipate the CLR to adapt NUMA soon?

Seems like NUMA is promising for parallel programming, and if I am not wrong the current latest cpus have built-in support for it, like the i7. Do you anticipate the CLR to adapt NUMA soon? EDIT: By this I mean having support for it, and taking advantage of it. ...

Nehalem Xeon performance on 32-bit OS, XP vs 2003

I have to run 32-bit code on WinXP or Win2003. Nehalem Xeons (5500 series) should be the fastest, but I'm not sure what'll happen with the memory arrangement. I'm unsure about 2 parts: To get a maximal speed memory setup, I'll need to install at least 6gb of RAM (to give each CPU 3 sticks to work with). Is the memory interleaved in suc...

Get node distance (hops) in NUMA sistems

Is there any API/way to get the "distance" (called 'hops' in literature) between two NUMA nodes? I want to implement a memory allocation system that takes advantage of this (reuse memory from the nearest node, because the access is faster). Windows doesn't seem to have such a feature... and libnuma (under Linux) doesn't seem to have it t...

Starting processes at same time is slower than staggering; why?

I'm evaluating the performance of an experimental system setup on an 8-core machine with 16GB RAM. I have two main-memory Java RDBMSs (hsqldb) running, and against each of these I run a TPCC client (derived from jTPCC/BenchmarkSQL). I have scripts to launch things, so e.g. the hsqldb instances are started with: ./hsqld.bash 0 & ./hsqld...

How to use GetNumaProximityNode (Win7+)?

Starting with Win7/Server2008R2 the GetNumaProximityNode(Ex) function is available. It should help retrieve the distance between NUMA nodes, but I can't understand from the documentation (http://msdn.microsoft.com/en-us/library/ms683206(VS.85).aspx) how it's supposed to work. It says that you give it a distance, and it returns the corres...

Does gcc, icc, or Microsoft's C/C++ compiler support or know anything about NUMA?

If I have a multi-processor board that has cache-coherent non-uniform memory access ( NUMA ), i.e. separate "northbridges" with separate RAM for each processor, does any compiler know how to automatically spread the data across the different memory systems such that processes working on local threads are mostly retrieving their data from...

Mapping of memory addresses to physical modules in Windows XP

I plan to run 32-bit Windows XP on a workstation with dual processors, based on Intel's Nehalem microarchitecture, and triple channel RAM. Even though XP is limited to 4 GB of RAM, my understanding is that it will function with more than 4 GB installed, but will only expose 4 GB (or slightly less). My question is: Assuming that 6 GB of ...

NUMA memory regions allocation in Windows 7

Our application is: Hardware configuration is a dual Xeon server running Windows 7/64bit. Each Xeon has it's own 12gb RAM in a [NUMA][1] configuration with a bridge connecting two memory regions together. All software is written using VS2008 in c++ and compiled as 64 bit applications. A Generation app creates a large shared memory...

Memory access time slow with VirtualAllocExNuma on Windows 7/64

In our application we are running on a dual Xeon server with memory configured as 12gb local to each processor and a memory bus connecting the two Xeon's. For performance reasons, we want to control where we allocate a large (>6gb) block of memory. Below is simplified code - DWORD processorNumber = GetCurrentProcessorNumber(); UCHAR ...