ansaurus

Question

Faster to malloc multiple small times or few large times?

Answer 1

+4 A:

Generally speaking, allocating larger chunks of memory fewer times will be faster. There's overhead involved each time a call to malloc() is made.

Jeff L 2009-07-07 19:17:13

For more info, have a look at Bonwick's Usenix paper on slab allocation. http://www.usenix.org/publications/library/proceedings/bos94/full_papers/bonwick.a

Dana the Sane 2009-07-07 19:21:19

Answer 2

+17 A:

It depends:

Multiple small times means multiple times, which is slower
There may be a special/fast implementation for small allocations.

If I cared, I'd measure it! If I really cared a lot, and couldn't guess, then I might implement both, and measure at run-time on the target machine, and adapt accordingly.

In general I'd assume that fewer is better: but there are size and run-time library implementations such that a (sufficiently) large allocation will be delegated to the (relatively slow) O/S. whereas a (sufficiently) small allocation will be served from a (relatively quick) already-allocated heap.

ChrisW 2009-07-07 19:19:24

And how do you know -- in general -- that your system has such a wonder-library?

Juergen 2009-07-07 19:48:38

To quote ChrisW: "If I cared, I'd measure it!"

dionadar 2009-07-07 19:50:58

@all+author: I am just curious about the fact that somebody asks a question here and accepts an answer from somebody who tells us himself, that he does not care (dionadar quoted him). I know, we are not doing science here, but why ask at all in this situation?

Juergen 2009-07-07 20:28:15

@Juergen What I told the OP, perhaps in a clumsy way, was that IMO if he cares then *he* should test it. It's possible (and even perhaps interesting) to guess, but the actual answer is platform-specific.

ChrisW 2009-07-07 20:38:49

@ChrisW: I don't think, that the OP wanted to meassure it, when he asked.

Juergen 2009-07-07 20:42:54

@Juergen Also I didn't answer your first comment about having "such a wonder-library" because I didn't understand it. Perhaps there *is* ow way to know in general, nor any way to *have* it "in general": having any particular library is a *specific* situation. That said, what I said about delegating to the O/S if the request is sufficiently large is I think what the MSVC run time library (for example) does.

ChrisW 2009-07-07 20:44:30

@Juergen My answer was that he can guess (and most people who guess will guess that it's the fewer allocations that will be the quicker); but that if he really, really wants to know for sure then he should measure it (or, at the least, say what run-time library he's using): and given that he specifically wants to know *in general*, the fact is that there is no guaranteed general answer.

ChrisW 2009-07-07 20:48:39

@all: just about everyone's answers were good, if I could composite one answer from all of them and vote that up, I would. Basically, I gave it to ChrisW because he was the first to answer that "generally" fewer is better, which is what I was asking, although perhaps maybe poorly. The locality of reference and fragmentation issues were ones I did not think of and I appreciate those answers as well along with Juergen's pointing out that malloc is general in purpose. Thanks to everyone who answered!

2009-07-07 21:44:05

@emi1faber: So your method is, to just vote the *first* up that says the right catchword? Regardless of the other content of the answer. I just can not compliment such a method. I hope you do no scientific work ... I hope this also for the other upvoters here.

Juergen 2009-07-09 11:20:00

Wow, I didn't know votes on StackOverflow got so contentious... Don't worry on the scientific front, I just write software for banks - look out financial system!

2009-07-09 19:25:14

Answer 3

+10 A:

Allocating large blocks is more efficient; additionally, since you are using larger contiguous blocks, you have greater locality of reference, and traversing your in-memory structure once you've generated it should also be more efficient! Further, allocating large blocks should help to reduce memory fragmentation.

Nick Lewis 2009-07-07 19:20:02

+1 for bringing up locality of reference.

Michael 2009-07-07 19:29:21

note that larger blocks can get you worse fragmentation if/when you deallocate them.

Javier 2009-07-07 19:38:02

@Javier: In general, deallocation of one larger block that consists of smaller blocks is better in fragmentation then allocating/deallocating those smaller blocks by them selfes. I can not prove it in 500+ characters, but you can't prove your daring statement either.

Juergen 2009-07-07 20:39:56

Answer 4

+1 A:

As already mentonned, malloc is costly, so fewer will probably be faster. Also, working with the pixels, on most platforms will have less cache-misses and will be faster. However, there is no guarantee on every platforms

stefaanv 2009-07-07 19:22:02

Answer 5

+3 A:

Allocating memory is work. The amount of work done when allocating a block of memory is typically independent of the size of the block. You work it out from here.

anon 2009-07-07 19:22:23

@ovanes: As much I understand, you tell the opposite of Neil. Also you don't try to beat the compiler, but a library routine. So your point is wrong here. When your allocation problem is that complex, that you can't beat this routine (I don't see that from the question), than you are in trouble, yes! Or you should study some books.

Juergen 2009-07-07 20:36:11

2Juergen: Sorry. You are right I misread the question.

ovanes 2009-07-07 22:59:14

Answer 6

+2 A:

In general malloc is expensive. It has to find an appropriate memory chunk from which to allocate memory and keep track of non-contiguous memory blocks. In several libraries you will find small memory allocators that try to minimize the impact by allocating a large block and managing the memory in the allocator.

Alexandrescu deals with the problem in 'Modern C++ Design' and in the Loki library if you want to take a look at one such libs.

David Rodríguez - dribeas 2009-07-07 19:22:46

Answer 7

+4 A:

Except speed issues there is also the memory fragmentation problem.

Nick D 2009-07-07 19:25:55

Answer 8

+2 A:

This question is one of pragmatism, I'm afraid; that is to say, it depends.

If you have a LOT of pixels, only a few of which are black then counting them might be the highest cost.

If you're using C++, which your tags suggest you are, I would strongly suggest using STL, somthing like std::vector.

The implementation of vector, if I remember correctly, uses a pragmatic approach to allocation. There are a few heuristics for allocation strategies, an informative one is this:

class SampleVector {
    int N,used,*data;
public:
    SampleVector() {N=1;used=0;data=malloc(N);}

    void push_back(int i)
    {
        if (used>=N)
        {
            // handle reallocation
            N*=2;
            data=realloc(data,N);
        }
        data[used++]=i;
    }
};

In this case, you DOUBLE the amount of memory allocated every time you realloc. This means that reallocations progressively halve in frequency.

Your STL implementation will have been well-tuned, so if you can use that, do!

Dave Gamble 2009-07-07 19:26:57

I disagree. You iterate through the pixels anyway - do not try to save the extra iteration of counting the pixels to be stored. The reallocation and copying of an std::vector behind the scene takes more time.

Zoli 2009-07-09 17:42:51

But of course it's still a function of how many pixels you store, vs how many you have to iterate through.

Dave Gamble 2009-07-09 19:54:00

Answer 9

+1 A:

Next to the allocation overhead itself, allocating multiple small chunks may result in lots of cache misses, while if you can iterate through a contiguous block, chances are better.

The scenario you describe asks for preallocation of a large block, imho.

xtofl 2009-07-07 19:28:04

Answer 10

A:

Although allocating large blocks is faster per byte of allocated memory, it will probably not be faster if you artificially increase the allocation size only to chop it up yourself. You're are just duplicating the memory management.

Hans Malherbe 2009-07-07 19:28:40

Answer 11

+3 A:

It's faster not to allocate in performance-sensitive code at all. Allocate the memory you're going to need once in advance, and then use and reuse that as much as you like.

Memory allocation is a relatively slow operation in general, so don't do it more often than necessary.

jalf 2009-07-07 19:33:19

Answer 12

+1 A:

"I can allocate-it-all" (really, I can!)

We can philosophy about some special implementations, that speed up small allocations considerably ... yes! But in general this holds:

malloc must be general. It must implement all different kinds of allocations. That is the reason it is considerably slow! It might be, that you use a special kinky-super-duper Library, that speeds things up, but also those can not do wonders, since they have to implement malloc in its full spectrum.

The rule is, when you have more specialized allocation coding, you are always faster then the broad "I can allocate-it-all" routine "malloc".

So when you are able to allocate the memory in bigger blocks in your coding (and it does not cost you to much) you can speed up things considerably. Also - as mentioned by others - you will get lot less fragmentation of memory, that also speeds things up and can cost less memory. You must also see, that malloc needs additional memory for every chunk of memory it returns to you (yes, special routines can reduce this ... but you don't know! what it does really unless you implemented it yourself or bought some wonder-library).

Juergen 2009-07-07 19:47:22

Answer 13

+2 A:

Another point to consider is how this interacts with threading. Using malloc many times in a threaded concurrent application is a major drag on performance. In that environment you are better off with a scalable allocator like the one used in Intel's Thread Building Blocks or Hoard. The major limitation with malloc is that there is a single global lock that all the threads contend for. It can be so bad that adding another thread dramatically slows down your application.

Matt Price 2009-07-07 20:22:18

Answer 14

+1 A:

Do an iteration over the pixels to count the number of them to be stored. Then allocate an array for the exact number of items. This is the most efficient solution.

You can use std::vector for easier memory management (see the std::vector::reserve procedure). Note: reserve will allocate probably a little (probably up to 2 times) more memory then necessary.

Zoli 2009-07-09 17:55:16

ansaurus

tags:

views:

answers:

Faster to malloc multiple small times or few large times?

related questions