views:

436

answers:

8

I am writing a C++ application which will require a block of memory (about 1000 bytes) as a temporary buffer for some text processing. The operation may be repeated up to 10,000 times a second.

Could anybody confirm that it would be more expensive to allocate the memory each time I need the buffer (i.e. new with a smart pointer, memory is deallocated when out of scope), than to have a fixed buffer and clear it (write every byte of it with a zero) every time the processing is complete?

It sounds like common sense for C++ but I just cannot find anything on the internet which confirms it.

Is the situation different for computer languages with automatic garbage collection facilities (e.g. Java, .net)?

+16  A: 

It's probably more expensive to allocate and free the memory each time you need it, but the bigger question is: does it matter? Write it the easiest way you know how that works correctly and doesn't leak/corrupt memory. Only then if your performance isn't good enough, profile the code to see where you can improve it.

Kristo
sustaining 10,000 allocations per second sounds like it just might matter - if 9,999 iterations in one second is too few... if there ARE realtime constraints, building provable limits into the design might be necessary. Not all design time 'optimization' is premature.
Chris Becke
I think Chris hit the nail on the head. To guarantee a throughput level you have to carefully consider code's algorithmic complexity. Static and local allocations have predictable performance. This is not true of dynamic allocations.
Constantin
+6  A: 

While I can't give you a scholarly "confirmation", consider this:

Whenever a CPU executes an instruction, it costs time. If you tear down your buffer, and then then reallocate it, the CPU will have to execute the deallocate/reallocate instructions. It will not have to execute those operations if you reuse the buffer.

It's probably safe to say that it's faster to reuse (and we haven't even talked about memory locality, where the same memory can stay in the CPU caches).

With that said, unless you are writing something that must be really, really tight (like a real-time application), or unless you have identified this buffer work as a bottleneck in your app (while you were taking performance measurements), I would say write it in the way that makes the most sense, and is easiest to maintain. On a modern machine, your maintenance costs are likely a larger percentage of the total cost of your software than the performance cost of managing 1,000 bytes.

JMarsch
+1  A: 

As always, the best way to answer this is to implement it both ways, and then time/benchmark both solutions to have an actual comparison to make rather than basing your opinion on speculation or what has worked for others in their experiences (which may be different than yours in ways you don't realize).

matt b
+4  A: 

Allocating memory (via new or malloc) does not clear it. If the memory MUST be set to 0 before you use it then you will need to clear it either way. In that case, using a static buffer is a big win. Additionally, if you only use a part of the buffer, you can keep track of how much is used and only have to clear the used part.

calloc does set the entire buffer to 0, but I can't imagine it being noticeably faster than malloc+memset.

Dolphin
+2  A: 

Concerning Java:

A big difference is that Java guarantees newly allocated memory to be 0 and probably does this faster than if you do it manually. Also, Java's garbage collector is extremely efficient when allocating and deallocating short-lived objects, while on the other hand, unnecessarily long-lived objects can cause extra work. So there is a good chance that reallocationg the array every time will perform better in Java than in C++.

But garbage collection performance is very hard to predict, so I'd just do it both ways and see which is faster. That would probably take less time than asking the question here and reading all the answers.

Michael Borgwardt
Thank you. I asked the Java question because I know that you have no control over deallocated memory. However, you are correct to point out that any allocated memory will be filled with zeros unlike in C++ when you have to do the clearing yourself.
Andy
+1  A: 

I'm going to skip the standard "don't worry about optimizations" since everyone else has covered that. It will be fastest to declare your buffer as a local to your function, and use memset to 0 it. With a local buffer, the space is "allocated" by the compiler simply by moving the stack pointer the right number of bytes. Allocation doesn't get any faster than that. With visual studio you can add the #pragma intrinsic( memset ). I know gcc supports intrinsics too, but I don't remember how to tell it to use them. I think the latest versions will use them where ever possible without being told to. The intrinsic memset expands to a few inlined instructions that tell the processor to 0 a range of memory. You won't get any faster than that. That said, don't zero the memory if you don't need to.

In addition, your code will be far clearer simply using a locally declared buffer. From what you've said, your buffer does not need to persist outside of the scope of the routine that is going to use it. 1000 bytes is tiny in modern terms. Using dynamic instead of automatic memory will add a bunch of code that has to be tested and maintained. Don't do it. Automatic memory is the obvious choice.

Rob K
Bear in mind that stack space is not necessarily unlimited, and you may not be able to allocate arbitrarily large stack objects.
David Thornley
True. Default stack size for visual studio is only 1M.
Rob K
+1  A: 

If you have done any kind of study in algorithmic efficiency (big o notation and the like) you'd know (or be able to work out) that most free store implementations can make no guarantees on the lower (or even upper) bounds of the algorithim iterations that will be executed to find an available block in the free store to satisfy the new/malloc() request.

Using one fixed buffer repeatedly is going to offer orders of magnitude more performance :- especially is garbage collected environments where unused memory blocks can linger in the freestore until a garbage collection cycle is run.

Chris Becke
Actually, there can be guarantees. There is no guarantee of O(1), which is what you get with reuse.
David Thornley
I do wish I could find an online analysis of free store algorithm effiency.
Chris Becke
A: 

I would have to say that when it comes down to big memory allocation it would be better to do it once rather than doing each time you need to allocate something. The reason for that is that the memory can become fragmented and slow (it cost a lot of resources to create and delete a ton of stuff in the memory). If you have a data structure that reserves a sufficient amount of memory for your operations it can be better. The downsize of doing so is that a large part of your memory will be taken.

Here is how a new and delete looks under the hood in C++:

#include <cstdlib>
using std::malloc;
using std::free;
#include <new>
using std::bad_alloc;

void * operator new(size_t n)
{
    void * p = malloc(n);
    if(!p) throw bad_alloc();
    return p;
}

void operator delete (void *p)
{
    if (p) free(p);
}

Doing a new and delete all the time can be costly! This is why languages such as C# and Java are slower than C++. The only advantage of a garbage collector is that it brings every thing in memory together (it defrags the memory) for your program. This can be costly if you have a heck of a ton of things in memory for your program.

Also, have a look at algorithm in the STL. It might help you by optimizing certain of operations.

Partial
Why was this not helpful?
Partial