I am currently looking into malloc()
implementation under Windows. But in my research I have stumbled upon things that puzzled me:
First, I know that at the API level, windows uses mostly the HeapAlloc()
and VirtualAlloc()
calls to allocate memory. I gather from here that the Microsoft implementation of malloc()
(that which is included in the CRT - the C runtime) basically calls HeapAlloc()
for blocks > 480 bytes and otherwise manage a special area allocated with VirtualAlloc()
for small allocations, in order to prevent fragmentation.
Well that is all good and well. But then there are other implementation of malloc()
, for instance nedmalloc, which claim to be up to 125% faster than Microsoft's malloc
.
All this makes me wonder a few things:
Why can't we just call
HeapAlloc()
for small blocks? Does is perform poorly in regard to fragmentation (for example by doing "first-fit" instead of "best-fit")?- Actually, is there any way to know what is going under the hood of the various API allocation calls? That would be quite helpful.
What makes
nedmalloc
so much faster than Microsoft'smalloc
?From the above, I got the impression that
HeapAlloc()
/VirtualAlloc()
are so slow that it is much faster formalloc()
to call them only once in a while and then to manage the allocated memory itself. Is that assumption true? Or is themalloc()
"wrapper" just needed because of fragmentation? One would think that system calls like this would be quick - or at least that some thoughts would have been put into them to make them efficient.- If it is true, why is it so?
On average, how many (an order of magnitude) memory reads/write are performed by a typical
malloc
call (probably a function of the number of already allocated segments)? I would intuitively says it's in the tens for an average program, am I right?