Honestly, it's trivial to write a program to compare the performance:
#include <ctime>
#include <iostream>
namespace {
class empty { }; // even empty classes take up 1 byte of space, minimum
}
int main()
{
std::clock_t start = std::clock();
for (int i = 0; i < 100000; ++i)
empty e;
std::clock_t duration = std::clock() - start;
std::cout << "stack allocation took " << duration << " clock ticks\n";
start = std::clock();
for (int i = 0; i < 100000; ++i) {
empty* e = new empty;
delete e;
};
duration = std::clock() - start;
std::cout << "heap allocation took " << duration << " clock ticks\n";
}
It's said that a foolish consistency is the hobgoblin of little minds. Apparently optimizing compilers are the hobgoblins of many programmers' minds. This discussion used to be at the bottom of the answer, but people apparently can't be bothered to read that far, so I'm moving it up here to avoid getting questions that I've already answered.
AN OPTIMIZING COMPILER MAY NOTICE THAT THIS CODE DOES NOTHING, AND MAY OPTIMIZE IT ALL AWAY. IT IS THE OPTIMIZER'S JOB TO DO STUFF LIKE THAT, AND FIGHTING THE OPTIMIZER IS A FOOL'S ERRAND. I WOULD RECOMMEND COMPILING THIS CODE WITH OPTIMIZATION TURNED OFF BECAUSE THERE IS NO GOOD WAY TO FOOL EVERY OPTIMIZER CURRENTLY IN USE OR THAT WILL BE IN USE IN THE FUTURE. ANYBODY WHO TURNS THE OPTIMIZER ON AND THEN COMPLAINS ABOUT FIGHTING IT SHOULD BE SUBJECT TO PUBLIC RIDICULE. If I cared about nanosecond precision I wouldn't use std::clock()
. If I wanted to publish the results as a doctoral thesis I would make a bigger deal about this, and I would probably compare GCC, Tendra/Ten15, LLVM, Watcom, Borland, Visual C++, Digital Mars, ICC and other compilers. As it is, heap allocation takes hundreds of times longer than stack allocation, and I don't see anything useful about investigating the question any further.
The optimizer has a mission to get rid of the code I'm testing. I don't see any reason to tell the optimizer to run and then try to fool the optimizer into not actually optimizing. But if I saw value in doing that, I would do one or more of the following:
Add a data member to class empty, and access that data member in the loop; but if I only ever read from the data member the optimizer can do constant folding and remove the loop; if I only ever write to the data member, the optimizer may skip all but the very last iteration of the loop. Additionally, the question wasn't "stack allocation and data access vs. heap allocation and data access."
Declare e
volatile
, but volatile
is often compiled incorrectly (PDF).
Take the address of e
inside the loop (and maybe assign it to a variable that is declared extern
and defined in another file). But even in this case, the compiler may notice that -- on the stack at least -- e
will always be allocated at the same memory address, and then do constant folding like in (1) above. I get all iterations of the loop, but the object is never actually allocated.
Beyond the obvious, this test is flawed in that it measures both allocation and deallocation, and the original question didn't ask about deallocation. Of course variables allocated on the stack are automatically deallocated at the end of their scope, so not calling delete would (1) skew the numbers (stack deallocation is included in the numbers about stack allocation, so it's only fair to measure heap deallocation) and (2) cause a pretty bad memory leak, unless we keep a reference to the new pointer and call delete after we've got our time measurement, but then keeping a reference will again skew the numbers.
On my machine, using g++ 3.4.4 on Windows, I get "0 clock ticks" for both stack and heap allocation for anything less than 100000 allocations, and even then I get "0 clock ticks" for stack allocation and "15 clock ticks" for heap allocation. When I measure 10,000,000 allocations, stack allocation takes 31 clock ticks and heap allocation takes 1562 clock ticks.
Two comments about the code:
Yes, an optimizing compiler may elide creating the empty objects. If I understand correctly, it may even elide the whole first loop. When I bumped up the iterations to 10,000,000 stack allocation took 31 clock ticks and heap allocation took 1562 clock ticks. I think it's safe to say that without telling g++ to optimize the executable, g++ did not elide the constructors.
To be on the safe side, it would be possible to add a field to empty (but then the name "empty" would be incorrect) and access that field, but that would add variable costs to the loop (accessing a field directly is faster than accessing a field through a pointer). Taking the address of e would also work, and "should" take the same amount of time in both loops.
I don't see a need to "calibrate the loops" because I don't care about the actual time spent, only about which loop takes longer. If initializing i
, testing i
, incrementing i
and jumping to the beginning of the loop takes a fixed amount of time in loop A, it will take the same amount of time in loop B. I don't care what that amount if time is, because it doesn't affect the end result (fixed time spent in loop manipulation plus time spent in stack allocation is X, same amount of time spent in loop manipulation plus time spent in heap allocation is Y; X < Y). Heap allocation is still much worse.
I'll concede I misunderstood the comment about calibrating the loops. I'm leaving that portion so that anybody concerned about the things I had mentioned will have answers.