views:

609

answers:

7

I am running a simulation with a lot if bunch of initial memory allocations per object. The simulation has to run as quickly as possible, but the speed of allocation is not important. I am not concerned with deallocation.

Ideally, the allocator will place everything in a contiguous block of memory. (I think this is sometimes called an arena?)

I am not able to use a flattened vector because the allocated objects are polymorphic.

What are my options?

+2  A: 

Just make your own.

See an old question of mine to see how you can start:

http://stackoverflow.com/questions/771458/improvements-for-this-c-stack-allocator

Unknown
That's a good start. Thanks. Did you ever write deallocate()?
Neil G
did you ever fix the alignment issues? How do you get the required necessary alignment of the system?
Neil G
@Neil, no I didn't finish deallocate, but it is pretty easy to either ignore it if its all PODS, or walk it sequentially if not. For the alignment issue, this isn't an issue for x86. If it is, just round to the nearest 16 byte alignment.
Unknown
Okay, I guess an empty deallocate would work -- I don't need to recover the memory -- I just have to not crash.As for alignment, I'm going for speed, so I imagine 4 byte alignment is all I need on x86?
Neil G
@Neil, it needs to be 16 byte alignment if you need some SSE instructions I think.
Unknown
@Unknown: While some x86 instructions can operate on unaligned data, they will be slower than using aligned data.
Shmoopty
A: 

The usual technique is fixed-block allocation. See: Lea, Robinson, Knowlton, Grunwald.

Edit: fixed block allocation can indeed leave gaps if there are frequent allocations and deallocations. One on project I worked with where a class might allocate many subobjects of different sizes but had to keep them contiguous, we used a simple memory pool: allocate all the memory needed for all the object's contents at once, and then use placement new to lay them out inside that.

If you don't know in advance how large the object's contents will be, you can write a pooling allocator that allocates memory sequentially; ie, it guarantees

Foo *a = new Foo();
Bar *b = new Bar;
b == ((byte *)(a)) + sizeof(Foo);

This will ensure that all allocations occuring within an object's constructor are contiguous. You'll get big, ragged gaps when the objects are deallocated, so we had to defragment every so often; even so the net speed gain was significant.

Crashworks
I think fixed-block allocation is not what I want. It is fast to allocate, but leaves gaps everywhere that minimize locality of reference. I don't care about allocation speed, but care a lot about locality of reference.
Neil G
We use them to get locality of reference (cache is life on the architecture I deal with); clearly it works better if you're not deallocating objects frequently. You can also defragment objects in the pool at intervals if you need to.
Crashworks
How do you deal with alignment?
Nikolai N Fetissov
In that system we forced word-boundary alignment. Things that needed wider alignment, like SSE vectors, were never new'd individually anyway (they had to be concrete class members), so used 16-byte alignment on them and juggled the order of members in the class to make sure we weren't making big packing gaps.
Crashworks
Oh, I see what you're saying. The answer is yes: classes that contained 16b-aligned objects had to themselves be aligned on a 16b boundary.
Crashworks
A: 

This is a pretty big subject, just take at wikipedia. One concrete example was in Alexandrescu book and should be implemented in his Loki library. GCC also comes with several implementations of std::allocator, just look into your distribution.

Nikolai N Fetissov
A: 

The Fixed-Size Block Allocator suite works fairly well, and is very attractively licensed (MIT).

Reed Copsey
A: 

How about Boost.Pool?

jalf
+1  A: 

Since you don't care about de-allocation you can use a linear allocator. Allocate a huge amount of memory up front and store a pointer to the start. malloc(x) moves the allocation pointer forward by x bytes and returns the old value of the pointer, delete(x) is stubbed out. As mentioned here, another poster already has an implimentation

Allocations are packed as closely as possible, allocations are incredibly fast and memory is returned in the order allocated. When your simulation is done, you just reset the allocator's pointer to the start of memory and clear any pointers you have from outside the allocator to objects inside of it.

Pool allocators are a great choice if you want to delete objects, faster than a heap but won't pack your data into memory as close and aren't quite as fast. Use boost:pool for those. This is a great choice for games if you have x bytes to store say - a level - and you are willing to throw the whole lot away at the same time.

As an aside, if you are interested in memory performance, see What Every Programmer Should Know About Memory-PDF. It covers things like locality of reference and its effect on performance. Specifically, you might want to create a pool for each type of object that is used together, or declare your objects as a struct of arrays rather than a array of structs

Tom Leys
this is exactly what I want, I think. So, I have to write it myself. I know that it's not "hard", but writing it myself means debugging and any mistake might be hard to find.I guess I will start with “Unknown”’s solution, as you noted. Thanks for the PDF.
Neil G
+1  A: 

Dave Hanson's C Interfaces and Implementations includes a very nice arena-based allocator. If memory serves, no metadata is stored with the objects, and they are allocated from contiguous free space, so this is about as much locality as you can hope for.

Norman Ramsey