tags:

views:

647

answers:

8

Consider two applications: one (num. 1) that invokes malloc() many times, and the other (num. 2) that invokes malloc() few times. Both applications allocate the same amount of memory (assume 100MB).
For which application the next malloc() call will be faster, #1 or #2?
In other words: Does malloc() have an index of allocated locations in memory?

+2  A: 

These are of course implementation details, but typically free() will insert the memory into a list of free blocks. malloc() will then look at this list for a free block that is the right size, or larger. Typically, only if this fails does malloc() ask the kernel for more memory.

There are also other considerations, such as when to coalesce multiple adjacent blocks into a single, larger block.

And, another reason that malloc() is expensive: If malloc() is called from multiple threads, there must be some kind of synchronization on these global structures. (i.e. locks.) There exist malloc() implementations with different optimization schemes to make it better for multple threads, but generally, keeping it multi-thread safe adds to the cost, as multiple threads will contend for those locks and block progress on each other.

asveikau
+1  A: 

The answer is that it depends, most of the potential slowness rather comes from malloc() and free() in combination and usually #1 and #2 will be of similar speed.

All malloc() implementations do have an indexing mechanism, but the speed of adding a new block to the index is usually not dependant on the number of blocks already in the index.

Most of the slowness of malloc comes from two sources

  • searching for a suitable free block among the previously freed(blocks)
  • multi-processor problems with locking

Writing my own almost standards compliant malloc() replacement tool malloc() && free() times from 35% to 3-4%, and it seriously optimised those two factors. It would likely have been a similar speed to use some other high-performance malloc, but having our own was more portable to esoteric devices and of course allows free to be inlined in some places.

Erik Elmgren
+3  A: 

Malloc has to run through a linked list of free blocks to find one to allocate. This takes time. So, #1 will usually be slower:

  • The more often you call malloc, the more time it will take - so reducing the number of calls will give you a speed improvement (though whether it is significant will depend on your exact circumstances).

  • In addition, if you malloc many small blocks, then as you free those blocks, you will fragment the heap much more than if you only allocate and free a few large blocks. So you are likely to end up with many small free blocks on your heap rather than a few big blocks, and therefore your mallocs may have to search further through the free-space lists to find a suitable block to allocate. WHich again will make them slower.

Jason Williams
+1 heap fragmentation can kill performance if you have lots of small objects on the heap.
pjc50
+6  A: 

You asked 2 questions:

  • for which application the next malloc() call will be faster, #1 or #2?
  • In other words: Does malloc() have an index of allocated locations in memory?

You've implied that they are the same question, but they are not. The answer to the latter question is YES.

As for which will be faster, it is impossible to say. It depends on the allocator algorithm, the machine state, the fragmentation in the current process, and so on.

Your idea is sound, though: you should think about how malloc usage will affect performance. There was once an app I wrote that used lots of little blobs of memory, each allocated with malloc(). It worked correctly but was slow. I replaced the many calls to malloc with just one, and then sliced up that large block within my app. It was much much faster.

I don't recommend this approach; it's just an illustration of the point that malloc usage can materially affect performance.

My advice is to measure it.

Cheeso
+1  A: 

You don't define the relative difference between "many" and "few" but I suspect most mallocs would function almost identically in both scenarios. The question implies that each call to malloc has as much overhead as a system call and page table updates. When you do a malloc call, e.g. malloc(14), in a non-brain-dead environment, malloc will actually allocate more memory than you ask for, often a multiple of the system MMU page size. You get your 14 bytes and malloc keeps track of the newly allocated area so that later calls can just return a chunk of the already allocated memory, until more memory needs to be requested from the OS.

In other words, if I call malloc(14) 100 times or malloc(1400) once, the overhead will be about the same. I'll just have to manage the bigger allocated memory chunk myself.

Richard Pennington
+1  A: 

Of course this completely depends on the malloc implementation, but in this case, with no calls to free, most malloc implementations will probably give you the same algorithmic speed.

As another answer commented, usually there will be a list of free blocks, but if you have not called free, there will just be one, so it should be O(1) in both cases.

This assumes that the memory allocated for the heap is big enough in both cases. In case #1, you will have allocated more total memory, as each allocation involves memory overhead to store meta-data, as a result you may need to call sbrk(), or equivalent to grow the heap in case #1, which would add an additional overhead.

They will probably be different due to cache and other second order effects, since the memory alignments for the new allocation won't be the same.

If you have been freeing some of the memory blocks, then it is likely that #2 will be faster due to less fragmentation, and so a smaller list of free blocks to search.

If you have freed all the memory blocks, it should end up being exactly the same, since any sane free implementation will have coalesced the blocks back into a single arena of memory.

benno
+1  A: 

You can always do a better job using malloc() to allocate a large chunk of memory and sub-dividing it yourself. Malloc() was optimized to work well in the general case and makes no assumptions whether or not you use threads or what the size of the program's allocations might be.

Whether it is a good idea to implement your own sub-allocator is a secondary question. It rarely is, explicit memory management is already hard enough. You rarely need another layer of code that can screw up and crash your program without any good way to debug it. Unless you are writing a debug allocator.

Hans Passant
A: 

Allocating one block of memory is faster than allocating many blocks. There is the overhead of the system call and also searching for available blocks. In programming reducing the number of operations usually speeds up the execution time.

Memory allocators may have to search to find a block of memory that is the correct size. This adds to the overhead of the execution time.

However, there may be better chances of success when allocating small blocks of memory versus one large block. Is your program allocating one small block and releasing it or does it need to allocate (and preserve) small blocks. When memory becomes fragmented, there are less big chunks available, so the memory allocator may have to coalesce all the blocks to form a block big enough for the allocation.

If your program is allocating and destroying many small blocks of memory you may want to consider allocating a static array and using that for your memory.

Thomas Matthews