tags:

views:

156

answers:

8

I need to allocate, and free lots of fixed size, small (16 byte) blocks of memory, in no fixed order. I could just call malloc and free for each, but that will probably be very inefficient. A better solution probably would be to call malloc and free for larger blocks, and handle the allocation within those blocks itself.

The question is, how best to do this?

It seems that this should not be a very unusual problem or rare problem, and that it should have been "solved", but I can't seem to find anything. Any pointers?

To clarify, I am aware that memory pool libraries, and what-not exist, but these also take a size parameter. If the size is constant, then different options for more efficient algorithms are available, are there any implementations of these?

+2  A: 

Before embarking on the onerous task of re-writing malloc, the standard advice applies. Profile your code, and make sure that this is actually a problem!

Oli Charlesworth
+2  A: 

The best way to do this is to not assume it will be inefficient. Instead try out the solution with malloc, measure the performance and prove that it is either efficient or not. Then once it's provide to be innefficient (likely won't be) is the only time you should move to a custom allocator. Without the proof you'll never know if your solution is actually faster or not.

JaredPar
+2  A: 

What you're looking for is called a memory pool. There are existing implementations, although it's not hard (and good practice) to make your own.

The simplest implementation for a pool of same-sized data is just a wrapper containing a buffer of n*size, and a stack of n pointers. "malloc" from the pool pops the pointer off the top. "free" to the pool puts the pointer back into the stack.

patros
This does not allow me to free large blocks at once, nor allocate them, unless I have a way of telling whether the entire block is referenced from different points in the linked list. This does require some implementation thought, and I'm looking to see if there is a reasonable solution I can just plug in.
mmmmalloc
+2  A: 

for your requirement your custom allocator would be really simple. just calloc a large array memory

calloc(N * 16)

and then you can just hand out array entries. inorder to track which array locations are in use you could use a simple bitmap and then with a few clever bit operations and pointer subtraction your custom malloc/free operations should be pretty easy. if you run out of space you can just realloc some more, but having a suitable fixed default would be a little easier.

though really you should just use malloc first. malloc creates pools of free memory blocks of different sizes, i would bet that there is a pool for 16 byte memory blocks (different implementations may or may not do this but its a pretty common optimization) and since all your allocations are the same size fragmentation should not be an issue. (plus debugging your allocator might be a bit of a nightmare.)

luke
+4  A: 

You're right, it's a common problem [Edit: how to do fixed-size allocation, I mean. "malloc is slowing down my application" is less common than you might think].

If your code is too slow and malloc a plausible culprit, then a simple cell allocator (or "memory pool") might improve things. You can almost certainly find one somewhere, or it's easy to write:

Allocate a large block, and put a singly-linked list node at the start of each 16-byte cell. Link them all together. To allocate, take the head off the list and return it. To free, add the cell to the head of the list. Of course if you try to allocate and the list is empty, then you have to allocate a new large block, divide it into cells, and add them all to the free list.

You can avoid that big up-front work if you want. When you allocate a big block, just store a pointer to the end of it. To allocate, move your pointer back 16 bytes through the block and return the new value. Unless it was already at the start of the block[*], of course. If that happens, and the free list is also empty, you need a new large block. Free doesn't change - just add the node to the free list.

You have an option whether to deal out of the block first, and check the free list if that's exhausted, or to check the free list first, and deal off the block if that's empty. I don't know which tends to be faster - the good thing about a last-in-first-out free list is that it's cache-friendly, since you're using memory that was used recently, so I'd probably try that first.

Note that the list node is not needed while the cell is allocated, so there's essentially zero overhead per cell. Quite aside from speed, this is likely to be an advantage over malloc, or other general-purpose allocators.

Do be aware that dropping the whole allocator is pretty much the only way to release memory back to the system, so users who are planning to allocate a lot of cells, use them, and free them all, should create their own allocator, use it, and then destroy it. Both for performance (you don't have to free all the cells) and to prevent the fragmentation-style effect where a whole block must be kept if any of its cells are in use. If you can't do this, your memory use will be the high-water-mark of the time your program has been running. For some programs that's a problem (for instance a long-running program with occasional big spikes in memory use, on a system where memory is constrained). For others it's absolutely fine (for instance if the number of cells in use increases until very near the end of the program, or fluctuates within a range where you really don't care that you're using more memory than you strictly could). For some its actively desirable (if you know how much memory you're going to use, you can allocate it all up-front and not have to worry about failures). For that matter, some implementations of malloc have difficulty releasing memory back from the process to the OS.

[*] Where "start of the the block" probably means "the start of the block, plus the size of some node used to maintain a list of all blocks, so they can all be freed when the cell allocator is destroyed".

Steve Jessop
`malloc` may be a culprit but rushing to a new solution is the wrong idea. Optimizing before you measure is wrong in all cases. How do you know what you're fixing if you don't measure?
JaredPar
@JaredPar: What would you measure if you're not comparing against anything else? `malloc` is a plausible culprit under two conditions: either the profiler shows that a lot of time is spent there, or it has been every single time you've written code like this in the past. A simple cell allocator takes all of half an hour to write (or if you've been there before you fetch the last one you wrote). I'm not rushing to anything. By all means say, "profile first", but if you don't then say what to do if the profile shows malloc is a problem, you have not in fact answered the question.
Steve Jessop
@Steve, you hookup a profiler and see where the time in your application is being spent. If it's malloc then investigate a replacement or investigate your usage. But it could just as likely be function `foo` where you had a typo or bad algorithm that is taking up the time. Replacing malloc without measuring is rushing to a solution. If you don't measure you simply don't know what you're fixing.
JaredPar
Yes, that's what I just said, and if you read my answer you'll see that it's all conditional on `malloc` use being a plausible optimisation target. Writing an allocator without good reason is rushing to a solution. *Telling someone how to write an allocator* without demanding that they first submit to me in triplicate a proof that their application needs one, is not rushing to a solution. It's providing information freely.
Steve Jessop
@JaredPar: Beyond issues of speed, if two pieces of code are frequently allocating and freeing things which are different sizes, severe memory fragmentation can occur. Code which makes heavy use of malloc and free may thus be 'brittle'. It's not always possible to predict when bad things will happen, so it may be wise to avoid malloc() for small objects with a short lifetime.
supercat
@supercat, true but a simple memory profiler can prove or disprove that you are fragmenting memory. Why guess when you can be sure?
JaredPar
@Steve, i'm much more in favor of a stronger push to the OP to challenge their assumptions about what the cause is. In my experience developers are terrible at predicting performance (irrespective of skill). But think mostly they're great at it and hence end up spending lots of time fixing issues that don't exist. Even if malloc is a plausible solution it's still irresponsible to implement a fix without measuring because you simply don't know what you're fixing.
JaredPar
@JaredPar: OK, well you do the thing where every question with "performance" or "speed" in it gets a standard answer telling the questioner that they shouldn't have asked the question until they've done their homework. I've (non-seriously) suggested before that this should be built into the SO software, but I've mostly lost interest in making it any more than a comment. I'll provide an answer for them to use if they get done with that ;-)
Steve Jessop
@JaredPar - The C code is generated from some input data. It is compiled, run, does exciting data structure manipulations, and returns. About 10% of the lines are mallocs or frees. It is a pretty safe bet that that is a bottleneck.
mmmmalloc
@mmmmalloc: you have roughly 3 options to collect on that bet. (1) run with a profiler, and see what proportion of time is spent in `malloc`. (2) replace `malloc` and demonstrate that your program runs faster. (3) work out how many mallocs the program does, and time a loop that does the same number. (1) has the enormous advantage that it tells you the time cost of everything, not just `malloc`. (2) and (3) have the disadvantage that if `malloc` is not a major time cost, you've wasted your effort. Unless your platform doesn't have a profiler, what is the advantage of guessing?
Steve Jessop
In particular: "exciting data structure manipulations" is pretty much exactly what `malloc` does too. Unless you have a good idea how much manipulation your code does vs. how much `malloc` does on your system, there's really no way to guess whether `malloc` is 99% of time or 1% of time.
Steve Jessop
+1  A: 

You could try overriding malloc/free with an alternative implementation that's suited to lots of small allocations.

paleozogt
using dlmalloc naively like that would incur a 50% to 100% memory overhead. Painful.
mmmmalloc
@mmmmalloc: If you're talking about the 8 or 16 byte per-allocation header then no it doesn't, because your system `malloc` already has an equivalent overhead per allocation.
Steve Jessop
Which is why I want a better solution...
mmmmalloc
A: 

Wilson, Johnstone, Neely, and Boles wrote a nice paper surveying all sorts of different allocators.

In my experience, the performance and overhead difference between a good fixed pool allocator and just relying on dlmalloc can be massive in cases where you're making lots and lots of short-lived small allocations in a limited address space (such as a system with no page file). In the app I'm working on at this moment, our main loop jumps from 30ms to >100ms if I replace our block allocator with simple calls to malloc() (and it eventually crashes due to fragmentation).

Crashworks
A: 

The following code is pretty ugly, but the purpose is not the beauty but to find out how big is the block allocated by malloc.
I asked for 4 bytes, and malloc requested and received 135160 bytes from OS.

#include <stdio.h>
#include <malloc.h>


int main()
{
  int* mem = (int*) malloc( sizeof(int) ) ;
  if(mem == 0) return 1;
  long i=1L;

  while(i)
    {
      mem[i-1] = i;
      printf("block is %d bytes\n", sizeof(int) * i++);
    }//while

  free(mem);
  return 0 ;
}

$ g++ -o file file.cpp
$ ./file
...
block is 135144 bytes
block is 135148 bytes
block is 135152 bytes
block is 135156 bytes
block is 135160 bytes
Segmentation fault

This malloc is serious business.
realloc doesn't do any system call if the requested size is less than it has available due to internal pooling.
After realloc copied the memory to a larger zone, it does nor destroy the previous block, not return it to system immediately. This can be still accessed (of course totally unsafe). With all these, it doesn't make sense to me, someone would need an additional memory pool.

grayasm