As people have pointed out - GC is faster to allocate (because it just gives you the next block on its list), but slower overall (because it has to compact the heap regularly, in order for allocs to be fast).
so - go for the compromise solution (which is actually pretty damn good):
You create your own heaps, one for each size of object you generally allocate (or 4-byte, 8 byte, 16-byte, 32-byte, etc) then, when you want a new piece of memory you grab the last 'block' on the appropriate heap. Because you pre-allocate from these heaps, all you need to do when allocating is grab the next free block. This works better than the standard allocator because you are happily wasting memory - if you want to allocate 12 bytes, you'll give up a whole 16 byte block from the 16-byte heap. You keep a bitmap of used v free blocks so you can allocate quickly without wasting loads of memory or needing to compact.
Also, because you're running several heaps, highly-parallel systems work much better as you don't need to lock so often (ie you have multiple locks for each heap so you don't get contention nearly as much)
Try it - we used it to replace the standard heap on a very intensive application, performance went up by quite a lot.
BTW. the reason the standard allocators are slow is that they try not to waste memory - so if you allocate a 5 byte, 7 byte and 32 bytes from the standard heap, it'll keep those 'boundaries'. Next time you need to allocate, it'll walk through those looking for enough space to give you what you asked for. That worked well for low-memory systems, but you only have to look at how much memory most apps use today to see that GC systems go the other way, and try to make allocations as fast as possible whilst caring nothing for how much memory is wasted.