views:

468

answers:

8

I am working on a project involving optimizing energy consumption within a system. Part of that project consists in allocating RAM memory based on locality, that is allocating memory segments for a program as close as possible to each other. Is there a way I can know where exactly is the position of the memory I allocate (the memory chips) and I was also wondering if it is possible to force allocation in a deterministic manner. I am interested in both Windows and Linux. Also, the project will be implemented in Java and .NET so I am interested in managed APIs to achieve this.

[I am aware that this might not translate into direct energy consumption reduction but the project is supposed to be a proof of concept.]

+1  A: 

In C/C++, if you coerce a pointer to an int, this tells you the address. However, under Windows and Linux, this is a virtual address -- the operating system determines the mapping to physical memory, and the memory management unit in the processor carries it out.

So, if you care where your data is in physical memory, you'll have to ask the OS. If you just care if your data is in the same MMU block, then check the OS documentation to see what size blocks it's using (4KB is usual for x86, but I hear kids these days are playing around with 16M giant blocks?).

Java and .NET add a third layer to the mix, though I'm afraid I can't help you there.

comingstorm
A: 

I think that if you want such a tide control over memory allocation you are better of using a compiled language such as C, the JVM, isolated the actual implementation of the language from the hardware, chip selection for data storage included.

Alon
A: 

In .NET there is a COM interface exposed for profiling .NET applications that can give you detailed address information. I think you will need to combine this with some calls to the OS to translate virtual addresses though.

As zztop eluded to, the .NET CLR compacts memory everytime a garbage collection is done. Although for large objects, they are not compacted. These are objects on the large object heap. The large object heap can consist of many segments scattered around from OS calls to VirtualAlloc.

Here are a couple links on the profiling APIs:

http://msdn.microsoft.com/en-us/magazine/cc300553.aspx

http://blogs.msdn.com/davbr/default.aspx

AaronLS
+1  A: 

Is pre-allocating in bigger chunks (than needed) an option at all? Will it defeat the original purpose?

blispr
+13  A: 

You're working at the wrong level of abstraction.

Java (and presumably .NET) refers to objects using handles, rather than raw pointers. The underlying Java VM can move objects around in virtual memory at any time; the Java application doesn't see any difference.

Win32 and Linux applications (such as the Java VM) refer to memory using virtual addresses. There is a mapping from virtual address to a physical address on a RAM chip. The kernel can change this mapping at any time (e.g. if the data gets paged to disk then read back into a different memory location) and applications don't see any difference.

So if you're using Java and .NET, I wouldn't change your Java/.NET application to achieve this. Instead, I would change the underlying Linux kernel, or possibly the Java VM.

For a prototype, one approach might be to boot Linux with the mem= parameter to restrict the kernel's memory usage to less than the amount of memory you have, then look at whether you can mmap the spare memory (maybe by mapping /dev/mem as root?). You could then change all calls to malloc() in the Java VM to use your own special memory allocator, which allocates from that free space.

For a real implementation of this, you should do it by changing the kernel and keeping userspace compatibility. Look at the work that's been done on memory hotplug in Linux, e.g. http://lhms.sourceforge.net/

user9876
I would also add that, under a NUMA aware OS the default allocator will be aggressively attempting to allocate memory local to the node (which will likely result in improved power draw if that leads to the other node(s) powering down. Note that from Vista onwards even non server versions of windows support NUMA...
ShuggyCoUk
this is inaccurate: you can get non pageable memory at a fixed address using ExAllocatePoolWithTag in windows with the non_pageable flag and mmap with map_locked and map_fixed flags under linux. (but you can't choose the address)you cannot choose the address, but at least is non pageable and non movable.to overcome java and .net isolation problem, you can have this done on a linked library, returning the pointer to the memory as a byte arrays to the respective vm. then with lot of work you can read and write data from this blob of memory writing your own "memory manager"
Lorenzo Boccaccia
A: 

The approach requires specialized hardware. In ordinary memory sticks and slots arrangements are designed to dissipate heat as even per chip as possible. For example 1 bit in every bus word per physical chip.

RocketSurgeon
A: 

This is an interesting topic although I think it is waaaaaaay beyond the capabilities of managed languages such as Java or .NET. One of the major principals of those languages is that you don't have to manage the memory and consequently they abstract that away for you. C/C++ gives you better control in terms of actually allocating that memory, but even in that case, as referenced previously, the operating system can do some hand waving and indirection with memory allocation making it difficult to determine how things are allocated together. Even then, you make reference to the actual chips, that's even harder and I would imagine would be hardware-dependent. I seriously would consider utilizing a prototyping board where you can code at the assembly level and actually control every memory unit allocation explicitly without any interference from compiler optimizations or operating system security practices. That would give you the most meaningful results as it would give you the ability to control every aspect of the program and determine, definitively that any power consumption improvements are due to your algorithm rather than some invisible optimization performed by the compiler or operating system. I imagine this is some sort of research project (very intriguing) so spending ~$100 on a prototyping board would definitely be worth it in my opinion.

Chris Thompson
+2  A: 

If you want to try this in a language with a big runtime you'd have to tweak the implementation of that runtime or write a DLL/shared object to do all the memory management for your sample application. At which point the overall system behaviour is unlikely to be much like the usual operation of those runtimes.

The simplest, cleanest test environment to detect the (probably small) advantages of locality of reference would be in C++ using custom allocators. This environment will remove several potential causes of noise in the runtime data (mainly the garbage collection). You will also lose any power overhead associated with starting the CLR/JVM or maintaining its operating state - which would presumably also be welcome in a project to minimise power consumption. You will naturally want to give the test app a processor core to itself to eliminate thread switching noise.

Writing a custom allocator to give you one of the preallocated chunks on your current page shouldn't be too tough, but given that to accomplish locality of reference in C/C++ you would ordinarily just use the stack it seems unlikely there will be one you can just find, download and use.

Mike Burrows