The only way to really tell which memory allocator is right for your application is to try a few out. All of the allocators mentioned were written by smart folks and will beat the others on one particular microbenchmark or another. If all your application does all day long is malloc one 8 byte chunk in thread A and free it in thread B, and doesn't need to handle anything else at all, you could probably write a memory allocator that beats the pants off any of those listed so far. It just won't be very useful for much else. :)
I have some experience using Hoard where I work (enough so that one of the more obscure bugs addressed in the recent 3.8 release was found as a result of that experience). It's a very good allocator - but how good, for you, depends on your workload. And you do have to pay for Hoard (though it's not too expensive) in order to use it in a commercial project without GPL'ing your code.
A very slightly adapted ptmalloc2 has been the allocator behind glibc's malloc for quite a while now, and so it's incredibly widely used and tested. If stability is important above all things, it might be a good choice, but you didn't mention it in your list, so I'll assume it's out. For certain workloads, it's terrible - but the same is true of any general purpose malloc.
If you're willing to pay for it (and the price is reasonable, in my experience), SmartHeap SMP is also a good choice. Most of the other allocators mentioned are designed as drop-in malloc/free new/delete replacements that can be LD_PRELOAD'd. SmartHeap can be used that way as well, but it also includes an entire allocation-related API that lets you fine-tune your allocators to your heart's content. In tests that we've done (again, very specific to a particular application), SmartHeap was about the same as Hoard for performance when acting as a drop-in malloc replacement; the real difference between the two is the degree of customization. You can get better performance the less general-purpose you need your allocator to be.
And depending on your use case, a general-purpose multithreaded allocator might not be what you want to use at all; if you're constantly malloc & free'ing objects that are all the same size, you might want to just write a simple slab allocator. Slab allocation is used in several places in the Linux kernel that fit that description. (I would give you a couple more useful links, but I'm a "new user" and Stack Overflow has decided that new users are not allowed to be too helpful all in one answer. Google can help out well enough, though.)