views:

1886

answers:

5

Hi I currently have heavily multithreaded server application, and I'm shopping around for a good multithreaded memory allocator.

So far I'm torn between:

-Sun's umem

-Google's tcmalloc

-Intel's threading building blocks allocator

-Emery Berger's hoard

From what I've found hoard might be the fastest, but I hadn't heard of it before today, so I'm skeptical if its really as good as it seems. Anyone have personal experience trying out these allocators?

+1  A: 

Maybe this is the wrong way to approach what you are asking, but maybe a different tactic could be employed altogether. If you are looking for a really fast memory allocator maybe you should ask why you need to be spending all that time allocating memory when you could perhaps just get away with stack allocation of variables. Stack allocation, while way more annoying, done right could save you lots in the way of mutex contention, as well as keeping strange memory corruption issues out of your code. Also, you potentially have less fragmentation which could help.

If this is a multithreaded environment, stack allocation is the way to go only for very small objects in small amounts - you don't want to hit the stack size on a thread, because then you get the same problem as normal memory corruption.
hazzen
Yup, I agree with hazzen. Stack allocation, including thread-local storage can lead to memory corruption if you deal with large to huge data sizes.
trshiv
+1  A: 

We used hoard on a project where I worked a few years ago. It seemed to work great. I have no experience iwth the other allocators. It should be pretty easy to try different ones and do load testing, no?

jfm3
+4  A: 

I've used tcmalloc and read about Hoard. Both have similar implementations and both achieve roughly linear performance scaling with respect to the number of threads/CPUs (according to the graphs on their respective sites).

So: if performance is really that incredibly crucial, then do performance/load testing. Otherwise, just roll a dice and pick one of the listed (weighted by ease of use on your target platform).

And from trshiv's link, it looks like Hoard, tcmalloc, and ptmalloc are all roughly comparable for speed. Overall, tt looks like ptmalloc is optimized for taking as little room as possible, Hoard is optimized for a trade-off of speed + memory usage, and tcmalloc is optimized for pure speed.

hazzen
+3  A: 

I personally prefer and recommend ptmalloc as a multithreaded allocator. Hoard is good, but in the evaluation my team did between Hoard and ptmalloc a few years ago, ptmalloc was better. From what I know, ptmalloc has been around for a number of years and is quite widely used as a multithreaded allocator.

You might find this comparison useful.

trshiv
+4  A: 

The only way to really tell which memory allocator is right for your application is to try a few out. All of the allocators mentioned were written by smart folks and will beat the others on one particular microbenchmark or another. If all your application does all day long is malloc one 8 byte chunk in thread A and free it in thread B, and doesn't need to handle anything else at all, you could probably write a memory allocator that beats the pants off any of those listed so far. It just won't be very useful for much else. :)

I have some experience using Hoard where I work (enough so that one of the more obscure bugs addressed in the recent 3.8 release was found as a result of that experience). It's a very good allocator - but how good, for you, depends on your workload. And you do have to pay for Hoard (though it's not too expensive) in order to use it in a commercial project without GPL'ing your code.

A very slightly adapted ptmalloc2 has been the allocator behind glibc's malloc for quite a while now, and so it's incredibly widely used and tested. If stability is important above all things, it might be a good choice, but you didn't mention it in your list, so I'll assume it's out. For certain workloads, it's terrible - but the same is true of any general purpose malloc.

If you're willing to pay for it (and the price is reasonable, in my experience), SmartHeap SMP is also a good choice. Most of the other allocators mentioned are designed as drop-in malloc/free new/delete replacements that can be LD_PRELOAD'd. SmartHeap can be used that way as well, but it also includes an entire allocation-related API that lets you fine-tune your allocators to your heart's content. In tests that we've done (again, very specific to a particular application), SmartHeap was about the same as Hoard for performance when acting as a drop-in malloc replacement; the real difference between the two is the degree of customization. You can get better performance the less general-purpose you need your allocator to be.

And depending on your use case, a general-purpose multithreaded allocator might not be what you want to use at all; if you're constantly malloc & free'ing objects that are all the same size, you might want to just write a simple slab allocator. Slab allocation is used in several places in the Linux kernel that fit that description. (I would give you a couple more useful links, but I'm a "new user" and Stack Overflow has decided that new users are not allowed to be too helpful all in one answer. Google can help out well enough, though.)

strangelydim