views:

192

answers:

6

Virtual memory is well-supported by modern hardware, but application-level memory allocation is still all implemented software, be it manual heap memory management a-la C and C++, or VM-level garbage collection.

Going further than the classic manual memory management vs garbage collection debate, why aren't there hardware-based primitives that can help build efficient memory management and/or garbage collection schemes in user-space (possibly exposed by, or built into the OS, and then various VMs).

Hardware is used to accelerate graphics operations, offload various parts of a network stack, and cryptographic algorithms and audio/video codecs are frequently implemented in hardware, why can't building blocks for higher-level memory management be? It seems so ubiquitous, yet I don't know any hardware-assisted implementations.

Given my lack of hardware knowledge it's a bit of a murky area to me, but I'm interested to hear

  1. if there is such a thing at all (at least at the research stage), or
  2. will or will it not give any benefit over conventional memory management, or alternatively
  3. why it is not feasible to build such a thing in the hardware?
+8  A: 

You could in theory implement a complete Java VM, including memory management, in hardware, and I believe there are some research projects that (try to) do that. But there are several good reasons not to implement stuff in hardware:

  • hardware is fixed, you can't easily patch bugs or implement newer/better algorithms
  • hardware is expensive, for complex operations such as garbage collection you'll need a lot of hardware, while a software implementation using existing hardware resources for it is much cheaper
  • hardware resources take up space and consume (static) power, even while not in use, while unused software code does relatively little harm

In the end, for each feature you want, you have to make the trade-off between these costs, and the gain you have (faster or lower-powered execution).

For memory management, which are typically complex algorithms but that don't run all that often, the gains will be rather small (you may be able to speed up garbage collection by 10x, but if it only took 1% of execution time to begin with, why bother?) The cost, on the other hand, will be a much bigger chip where much of the area is wasted because it's inactive most of the time...

Wim
Actually what got me thinking of this is the fact that most programs tend to actively avoid dynamic memory allocation on a hot path, and a lot of optimisation guides will tell you the same. Can this be the reason we currently have only (claimed) 1% of execution time taken by memory management? If this wasn't an issue, we could use a more uniform memory model for almost everything, without compromising code maintainability for performance. I wish I had a dollar every time I see yet another re-implementation of malloc or a memory pool.
Alex B
PS while I agree that not all types of systems may need this (e.g. some mobile, or low-end embedded systems).
Alex B
That 1% was proverbial, I don't know actual numbers ;-) I think the reason of avoiding malloc()s in critical paths is the fact that they can fail, or lead to swapping or other non-deterministic behavior, rather than pure performance.
Wim
And maybe the fact that you see so many malloc implementations also shows that the algorithms used are too specific or change too often, again a reason not to put them in HW (yet). But you're right in your other comment, that there's a good middle way, i.e. hardware-accelerated primitives while keeping the high-level stuff in SW.
Wim
+2  A: 

Yes, there were several CPUs which had memory management and GC built in. One was a custom version of the N320xx CPU that powered the Ceres workstation. It used 34bit memory (ie. 32bit data + 2bit extra).

There are several reasons why there is little hardware support for GC today:

  1. You need a special mainboard -> expensive
  2. You need a special CPU -> very expensive
  3. You need special software that can use the extra features of the CPU and mainboard
  4. There is still a lot of research going on how to make GC more efficient. This is a very active area, comparable to the time when we were drawing images by setting individual pixels. When we learn which parts can be standardized, it will make sense to build hardware for it.
  5. It would waste memory for all programs which don't use this feature

[EDIT] The next generation of "general purpose CPUs" will probably come with a programmable area (FPGA) where you can define new "assembler op-codes". That would allow software to modify the CPU to its specific needs. The problem to solve here is to make the loading of the FPGA faster so its content can be switched with their processes.

This would allow to create hardware support for

Aaron Digulla
This somewhat confirms my suspicions, but, for example, graphics cards' capabilities are hardly set in stone yet, there is a lot of research as well, yet graphics acceleration hardware is being made and sold at commodity prices and constantly evolves.
Alex B
Well, started with pretty basic stuff like lines and filling polygons. Also, todays cards aren't magic, they are just massive parallel multi-core cpus (very simple ones). Hardly a new concept. But since the evolution is exponential, we see more new things faster. When the first FPGA CPUs are introduced, we'll see a similar game with custom CPU code. The first versions will be slow and ugly and everyone will whine and after three years, we'll feel "how could we live without it??" :-)
Aaron Digulla
+2  A: 

In modern processors like pentium there are features which support virtual memory management. But the implementation of it has to be done by the OS, because there are so many possible algorithms how memory could be managed.

Which algorithm fits best depends on how the the memory is used. which kinds of applications are run on the computer? How long do they run? How many applications are run? And how are taskswitched organized.

You cannot hardwire this in hardware because of that. The operating system knows better how to efficiently manage memory, since it is made for a special tsype of computer (server vs. desktop OS), also it has a higher level view on the processes run on your computer.

codymanix
The way I see it implemented is not completely handled by CPU, but rather that CPU provides a few well-defined low-level primitives to help the OS implement it (just like virtual memory isn't magically handled in one monolithic module by the CPU, OS still has to do the work).
Alex B
Exactly. The processor supports virtual memory management by having a built-in MMU (memory management unit) which provides low level API which enables an OS to implement a high level API.
codymanix
+1  A: 

In the embedded space, Ajile Systems Inc., http://www.ajile.com/ produce a series of JVM on a chip products which feature optional GC. They also offer a multiple JVM feature where java processes execute independantly on their own VM in a deterministic, time-sliced schedule with full memory protection.

They seem to offer three GC algorithms and an off mode. So not only a JVM on a chip more like an OS, of sorts, on a chip.

Don Mackenzie
This is somewhat along the lines of what I had in mind, but not as general.
Alex B
+1  A: 

There are so many different algorithms and approaches to this problem that nobody has yet figured out any common primitives to them.

Vilx-
While I agree , but there are also a few canonical allocator implementations (it's not like GLIBC allocator gets changed every minor release). Hardware allocators can evolve too, since we seem to get a new CPU generation every couple of years (case in point: see my comment to the other answer about graphics cards).
Alex B
+2  A: 

We had a lot of this hardware stuff in the 70th and 80th of the last millenium. All this Lisp machines were pretty good in trying to help memory management with indirect and double indirect access (required if your GC moves objects around). Some of us also remember the first days of the 80286 where people thought that segments could be used for better memory management and failed terrible on performance.

The current state of wisdom so far is that it is much better to optimize CPU's for general purpose usage instead of adding special features that are required only from time to time.

Modern garbage collectors already use some operating system features like the dirty markings of virtual pages to implement write barriers but other then this the algorithms are pretty simple, straightforward and high level. There isn't really special hardware required.

I just recently found an amazing result when using HP-UX. You can set the virtual page size to 256MB which will effectivly turn of the virtual memory. This gave a 120% performance increase on this CPU. TLB misses are really serious even more then cache misses. This makes me think about the good old MIPS architecture which stored a process id in the TLB so it did not require a complete TLB flush on each process switch.

There is still lot of room for memory management improvements that are more important then some high level Garbage collection features.

Lothar