views:

668

answers:

6

I'm looking for ideas for a heap-manager to handle a very specific situation: Lots and lots of very small allocations, ranging from 12 to 64 bytes each. Anything bigger, I will pass on to the regular heap-manager, so only tiny blocks need be catered for. Only 4-byte alignment is needed.

My main concerns are

  1. Overhead. The regular libc heap will typically round up an allocation to a multiple of 16 bytes, then add another 16 byte header - this means over 50% overhead on a 20-byte allocation, which sucks.
  2. Performance

One helpful aspect is that Lua (which is the user of this heap) will tell you the size of the block it's freeing when it calls free() - this may enable certain optimisations.

I'll post my current approach, which works ok, but I'd like to improve on it if at all possible. Any ideas?

+6  A: 

It is possible to build a heap manager that is very efficient for objects that are all the same size. You could create one of these heaps for each size of object that you need, or if you don't mind using a bit of space, create one for 16 byte objects, one for 32, and one for 64. The maximum overhead would be 31 bytes for a 33 byte allocation (which would go on the 64 blocksize heap).

Greg Hewgill
Yep. I'd probably go as low as 4-byte granularity, but this sounds like what I need. So this could be done with entirely headerless blocks, right?
Menkboy
Yes, no headers are needed. An allocated block needs no header, and a free block needs only a pointer to the next free block.
Greg Hewgill
Sweet. Presumably GCing it to reclaim memory would be headache, but I'll see if I can get away with simply avoiding that, or maybe try to do it incrementally.
Menkboy
+1  A: 

The answer may depend on the lifetime patterns for these objects. If the objects are all instantiated as you proceed, and then all removed in one fell swoop, it may make sense to create a very simple heap manager that allocates memory by simply incrementing a pointer. Then, when you're done, blow away the entire heap.

Raymond Chen made an interesting post that may help to inspire you. :)

Greg D
A: 

If a bunch of memory is allocated, used, and freed before moving on to the next round of allocation, I'd suggest using the simplest allocator possible:

typedef struct _allocator {
    void* buffer;
    int start;
    int max;
} allocator;

void init_allocator(size_t size, allocator* alloc) {
    alloc->buffer = malloc(size);
    alloc->start = 0;
    alloc->max = size;
}

void* allocator_malloc(allocator* alloc, size_t amount) {
    if (alloc->max - alloc->start < 0) return NULL;
    void* mem = alloc->buffer + alloc->start;
    alloc->start += bytes;
    return mem;
}

void allocator_free(allocator* alloc) {
    alloc->start = 0;
}
hazzen
Reasonable for c++, the target language is c.
EvilTeach
Not so hard to convert to C, non?
hazzen
Sorry, should've mentioned that C++ is fine with me; the problem's that Lua churns the heap so heavily that this sort of approach isn't practical, unfortunately.
Menkboy
+5  A: 

To expand on what Greg Hewgill says, one way to do an ultra-efficient fixed-size heap is:

  1. Split a big buffer into nodes. Node size must be at least sizeof(void*).
  2. String them together into a singly-linked list (the "free list"), using the first sizeof(void*) bytes of each free node as a link pointer. Allocated nodes will not need a link pointer, so per-node overhead is 0.
  3. Allocate by removing the head of the list and returning it (2 loads, 1 store).
  4. Free by inserting at the head of the list (1 load, 2 stores).

Obviously step 3 also has to check if the list's empty, and if so do a bunch of work getting a new big buffer (or fail).

Even more efficient, as Greg D and hazzen say, is to allocate by incrementing or decrementing a pointer (1 load, 1 store), and not offer a way to free a single node at all.

Edit: In both cases, free can deal with the complication "anything bigger I pass on the regular heap-manager" by the helpful fact that you get the size back in the call to free. Otherwise you'd be looking at either a flag (overhead probably 4 bytes per node) or else a lookup in some kind of record of the buffer(s) you've used.

Steve Jessop
Thank you very much, please accept an honorary +1 (I don't have a proper voting account).
Menkboy
PS: I can avoid the free() complications because Lua tells me the blocksize it's freeing. So I can very quickly look up which heap it came from and act accordingly.
Menkboy
Yes, I noticed that too just after I posted, but I'm having connectivity problems so you beat me to it :-(
Steve Jessop
+1  A: 

I like onebyones answer.

You might also consider The buddy system for your sets of fixed size heaps.

EvilTeach
A: 

I use a mostly O(1) Small Block Memory Manager (SBMM). Basically it works this way:

1) It allocates larger SuperBlocks from the OS and tracks the Start+End Addresses as a range. The size of the SuperBlock is adjustable but 1MB makes a pretty good size.

2) The SuperBlocks are broken into Blocks (also adjustable in size... 4K-64K is good depending on your app). Each of these Blocks handles allocations of a specific size and stores all the items in the Block as a singly linked list. When you allocate a SuperBlock, you make a linked list of Free Blocks.

3) Allocating an Item means A) Checking to see if there is a Block with Free Items handling that size - and if not, allocating a new Block from the SuperBlocks. B) Removing the Item from the Block's Free List.

4) Freeing an Item by address means A) Finding SuperBlock containing address(*) B) Finding Block in SuperBlock (substract SuperBlock start address and divide by Block size) C) Pushing Item back on the Block's Free Item list.

As I stated, this SBMM is very fast as it runs with O(1) performance(*). In the version I have implemented, I use an AtomicSList (similar to SLIST in Windows) so that it is not only O(1) performance but also ThreadSafe and LockFree in the implementation. You could actually implement the algorithm using Win32 SLIST if you wanted to.

Interestingly, the algorithm for allocating Blocks from the SuperBlocks or Items from the Blocks result in nearly identicaly code (they're both O(1) allocations off a Free List).

(*) The SuperBlocks are arranged in a rangemap with O(1) average performance (but a potential O(Lg N) for worstcase where N is the number of SuperBlocks). The width of the rangemap depends on knowing roughly how much memory you're going to need in order to get the O(1) performance. If you overshoot, you'll waste a bit of memory but still get O(1) performance. If you undershoot, you'll approach O(Lg N) performance but the N is for the SuperBlock count -- not the item count. Since the SuperBlock count is very low compared to the Item count (by about 20 binary orders of magnitude in my code), it is not as critical as the rest of the allocator.

Adisak