views:

212

answers:

4

I've got a small class (16 bytes on a 32bit system) which I need to dynamically allocate. In most cases the life-time of any given instance is very short. Some instances may also be passed across thread boundaries.

Having done some profiling, I found that my program appears to be spending more time allocating and deallocating the things than it's actually spending using them so I want to replace the default new and delete with something that a little more efficient.

For a large object (db connections as it happens, which are expensive to construct rather than allocate), I'm already using a pooling system, however that involves a list for storing the "free" objects, and also a mutex for thread safety. Between the mutex and the list it actually performs worse than with the basic new/delete for the small objects.

I found a number of small object allocators on Google, however they seem to be using a global/static pool which is not used in a thread safe manner, making them unsuitable for my use :(

What other options have I got for efficient memory management of such small objects?

A: 

For many small objects, I suggest using either std::map or std::vector. The std::vector or an array will reduce fragmentation in the dynamic memory area.

Another suggestion is to place the objects into a Singleton class and access them through that class. The Singleton will allow a one-time allocation for the variables and reduce fragmentation of dynamic memory.

These are some techniques we use with Embedded Systems, where memory is often a fixed small size.

Thomas Matthews
Neither std::map nor std::vector are thread safe.
jmucchiello
A: 

You might make sure that you are using the low fragmentation heap. It is on by default in Vista and later, but I do not think that is so with earlier OS's. That can make a big difference in allocation speed for small objects.

Mark Wilkins
Well I'm testing on Vista so in that case it's already on. Worth knowing though since I am supporting XP.
Fire Lancer
It's an interesting issue (more so since I don't have to solve it :). I am curious to see the final solution.
Mark Wilkins
+1  A: 

Maybe try using Google's tcmalloc? It is optimized for fast allocation/deallocation in a threaded program, and has low overhead for small objects.

Stephen
+1  A: 

Some instances may also be passed across thread boundaries

Only "some"? So perhaps you can afford to pay extra for these, if it makes the ones that don't get passed to other threads cheaper. There are various ways I can think of to get to one allocator per thread and avoid the need to lock when allocating or freeing in the thread to which the allocator belongs. I don't know which might be possible in your program:

  • Copy things across the thread boundary, instead of passing them.

  • Arrange that if they're passed to another thread for any reason, then they're passed back to the original thread to free. This doesn't necessarily have to happen very often, you could queue up a few in the receiving thread and pass them all back in a message later. This assumes of course that the thread which owns the allocator is going to stick around.

  • Have two free lists per allocator, one synchronised (to which objects are added when they're freed from another thread), and one unsynchronised. Only if the unsynchronised list is empty, and you're allocating (and hence in the thread which owns the allocator), do you need to lock the synchronised free list and move all of its current contents to the unsynchronised list. If objects being passed to other threads is rare, this basically eliminates contention on the mutex and massively reduces the number of times it's taken at all.

  • If all the above fails, having one allocator per thread might still allow you to get rid of the mutex and use a lock-free queue for the free list (multiple writers freeing, single reader allocating), which could improve performance a bit. Implementing a lock-free queue is platform-specific.

Taking a step further back, does your app frequently hit a state in which you know that all cells allocated after a certain point (perhaps a little in the past), are no longer in use? If so, and assuming the destructor of your small objects doesn't do anything terribly urgent, then don't bother freeing cells at all - at the "certain point" create a new allocator and mark the old one as no longer in use for new allocations. When you "hit the state", free the whole allocator and its underlying buffer. If the "certain point" and the "state" are simultaneous, all the easier - just reset your allocator.

Steve Jessop
"after a certain point (perhaps a little in the past), are no longer in use?" unfortunately no, they are used throughout the lifetime of the program. Following on the one-allocator per thread. What if I made a pool allocator that had it's free-list in some kind of TLS. Then id just need a occasional synchronised process to combat the possibility of objects migrating from one-thread to another causing one to free more than it allocates?
Fire Lancer
Yep, that sounds plausible. The slower the drift from one thread to another, the less often you have to synchronise and the lower the overhead.
Steve Jessop
You could also take a look at this: http://goog-perftools.sourceforge.net/doc/tcmalloc.html
Steve Jessop