views:

659

answers:

14

When using malloc to allocate memory, is it generally quicker to do multiple mallocs of smaller chunks of data or fewer mallocs of larger chunks of data? For example, say you are working with an image file that has black pixels and white pixels. You are iterating through the pixels and want to save the x and y position of each black pixel in a new structure that also has a pointer to the next and previous pixels x and y values. Would it be generally faster to iterate through the pixels allocating a new structure for each black pixel's x and y values with the pointers, or would it be faster to get a count of the number of black pixels by iterating through once, then allocating a large chunk of memory using a structure containing just the x and y values, but no pointers, then iterating through again, saving the x and y values into that array? I'm assuming certain platforms might be different than others as to which is faster, but what does everyone think would generally be faster?

+4  A: 

Generally speaking, allocating larger chunks of memory fewer times will be faster. There's overhead involved each time a call to malloc() is made.

Jeff L
For more info, have a look at Bonwick's Usenix paper on slab allocation. http://www.usenix.org/publications/library/proceedings/bos94/full_papers/bonwick.a
Dana the Sane
+17  A: 

It depends:

  • Multiple small times means multiple times, which is slower
  • There may be a special/fast implementation for small allocations.

If I cared, I'd measure it! If I really cared a lot, and couldn't guess, then I might implement both, and measure at run-time on the target machine, and adapt accordingly.

In general I'd assume that fewer is better: but there are size and run-time library implementations such that a (sufficiently) large allocation will be delegated to the (relatively slow) O/S. whereas a (sufficiently) small allocation will be served from a (relatively quick) already-allocated heap.

ChrisW
And how do you know -- in general -- that your system has such a wonder-library?
Juergen
To quote ChrisW: "If I cared, I'd measure it!"
dionadar
@all+author: I am just curious about the fact that somebody asks a question here and accepts an answer from somebody who tells us himself, that he does not care (dionadar quoted him). I know, we are not doing science here, but why ask at all in this situation?
Juergen
@Juergen What I told the OP, perhaps in a clumsy way, was that IMO if he cares then *he* should test it. It's possible (and even perhaps interesting) to guess, but the actual answer is platform-specific.
ChrisW
@ChrisW: I don't think, that the OP wanted to meassure it, when he asked.
Juergen
@Juergen Also I didn't answer your first comment about having "such a wonder-library" because I didn't understand it. Perhaps there *is* ow way to know in general, nor any way to *have* it "in general": having any particular library is a *specific* situation. That said, what I said about delegating to the O/S if the request is sufficiently large is I think what the MSVC run time library (for example) does.
ChrisW
@Juergen My answer was that he can guess (and most people who guess will guess that it's the fewer allocations that will be the quicker); but that if he really, really wants to know for sure then he should measure it (or, at the least, say what run-time library he's using): and given that he specifically wants to know *in general*, the fact is that there is no guaranteed general answer.
ChrisW
@all: just about everyone's answers were good, if I could composite one answer from all of them and vote that up, I would. Basically, I gave it to ChrisW because he was the first to answer that "generally" fewer is better, which is what I was asking, although perhaps maybe poorly. The locality of reference and fragmentation issues were ones I did not think of and I appreciate those answers as well along with Juergen's pointing out that malloc is general in purpose. Thanks to everyone who answered!
@emi1faber: So your method is, to just vote the *first* up that says the right catchword? Regardless of the other content of the answer. I just can not compliment such a method. I hope you do no scientific work ... I hope this also for the other upvoters here.
Juergen
Wow, I didn't know votes on StackOverflow got so contentious... Don't worry on the scientific front, I just write software for banks - look out financial system!
+10  A: 

Allocating large blocks is more efficient; additionally, since you are using larger contiguous blocks, you have greater locality of reference, and traversing your in-memory structure once you've generated it should also be more efficient! Further, allocating large blocks should help to reduce memory fragmentation.

Nick Lewis
+1 for bringing up locality of reference.
Michael
note that larger blocks can get you worse fragmentation if/when you deallocate them.
Javier
@Javier: In general, deallocation of one larger block that consists of smaller blocks is better in fragmentation then allocating/deallocating those smaller blocks by them selfes. I can not prove it in 500+ characters, but you can't prove your daring statement either.
Juergen
+1  A: 

As already mentonned, malloc is costly, so fewer will probably be faster. Also, working with the pixels, on most platforms will have less cache-misses and will be faster. However, there is no guarantee on every platforms

stefaanv
+3  A: 

Allocating memory is work. The amount of work done when allocating a block of memory is typically independent of the size of the block. You work it out from here.

anon
@ovanes: As much I understand, you tell the opposite of Neil. Also you don't try to beat the compiler, but a library routine. So your point is wrong here. When your allocation problem is that complex, that you can't beat this routine (I don't see that from the question), than you are in trouble, yes! Or you should study some books.
Juergen
2Juergen: Sorry. You are right I misread the question.
ovanes
+2  A: 

In general malloc is expensive. It has to find an appropriate memory chunk from which to allocate memory and keep track of non-contiguous memory blocks. In several libraries you will find small memory allocators that try to minimize the impact by allocating a large block and managing the memory in the allocator.

Alexandrescu deals with the problem in 'Modern C++ Design' and in the Loki library if you want to take a look at one such libs.

David Rodríguez - dribeas
+4  A: 

Except speed issues there is also the memory fragmentation problem.

Nick D
+2  A: 

This question is one of pragmatism, I'm afraid; that is to say, it depends.

If you have a LOT of pixels, only a few of which are black then counting them might be the highest cost.

If you're using C++, which your tags suggest you are, I would strongly suggest using STL, somthing like std::vector.

The implementation of vector, if I remember correctly, uses a pragmatic approach to allocation. There are a few heuristics for allocation strategies, an informative one is this:

class SampleVector {
    int N,used,*data;
public:
    SampleVector() {N=1;used=0;data=malloc(N);}

    void push_back(int i)
    {
        if (used>=N)
        {
            // handle reallocation
            N*=2;
            data=realloc(data,N);
        }
        data[used++]=i;
    }
};

In this case, you DOUBLE the amount of memory allocated every time you realloc. This means that reallocations progressively halve in frequency.

Your STL implementation will have been well-tuned, so if you can use that, do!

Dave Gamble
I disagree. You iterate through the pixels anyway - do not try to save the extra iteration of counting the pixels to be stored. The reallocation and copying of an std::vector behind the scene takes more time.
Zoli
But of course it's still a function of how many pixels you store, vs how many you have to iterate through.
Dave Gamble
+1  A: 

Next to the allocation overhead itself, allocating multiple small chunks may result in lots of cache misses, while if you can iterate through a contiguous block, chances are better.

The scenario you describe asks for preallocation of a large block, imho.

xtofl
A: 

Although allocating large blocks is faster per byte of allocated memory, it will probably not be faster if you artificially increase the allocation size only to chop it up yourself. You're are just duplicating the memory management.

Hans Malherbe
+3  A: 

It's faster not to allocate in performance-sensitive code at all. Allocate the memory you're going to need once in advance, and then use and reuse that as much as you like.

Memory allocation is a relatively slow operation in general, so don't do it more often than necessary.

jalf
+1  A: 

"I can allocate-it-all" (really, I can!)

We can philosophy about some special implementations, that speed up small allocations considerably ... yes! But in general this holds:

malloc must be general. It must implement all different kinds of allocations. That is the reason it is considerably slow! It might be, that you use a special kinky-super-duper Library, that speeds things up, but also those can not do wonders, since they have to implement malloc in its full spectrum.

The rule is, when you have more specialized allocation coding, you are always faster then the broad "I can allocate-it-all" routine "malloc".

So when you are able to allocate the memory in bigger blocks in your coding (and it does not cost you to much) you can speed up things considerably. Also - as mentioned by others - you will get lot less fragmentation of memory, that also speeds things up and can cost less memory. You must also see, that malloc needs additional memory for every chunk of memory it returns to you (yes, special routines can reduce this ... but you don't know! what it does really unless you implemented it yourself or bought some wonder-library).

Juergen
+2  A: 

Another point to consider is how this interacts with threading. Using malloc many times in a threaded concurrent application is a major drag on performance. In that environment you are better off with a scalable allocator like the one used in Intel's Thread Building Blocks or Hoard. The major limitation with malloc is that there is a single global lock that all the threads contend for. It can be so bad that adding another thread dramatically slows down your application.

Matt Price
+1  A: 

Do an iteration over the pixels to count the number of them to be stored. Then allocate an array for the exact number of items. This is the most efficient solution.

You can use std::vector for easier memory management (see the std::vector::reserve procedure). Note: reserve will allocate probably a little (probably up to 2 times) more memory then necessary.

Zoli