I'm not very expert on how processors work, but one might imagine that it was easier to set chunks of memory to zero than non zero values and so it may be marginally faster.
views:
302answers:
7If you can do that with the help of the virtual memory system, you can get zeroed (non-allocated) pages faster than non-zero pages. Such optimization is normally not used in C++ applications (e.g. the standard library implementation), so do not expect to get any difference between allocating a std::vector filled with zero versus some other value.
I think the only difference would be in setting up the register that has the value to store to memory. Some processors have a register that's fixed at zero (ia64 for example). Even so, whatever minuscule overhead there might be for setting up a register will be monstrously dwarfed by the writing to memory.
As far as the time to actually write to the memory - that'll be clocked the same on all architectures I'm familiar with.
I have no idea, because of the number of factors involved, but the way to find out is to code both ways and benchmark them.
It's worth noting that the Windows VirtualAlloc
function initializes newly-allocated memory to zero, although the Microsoft debug C++ runtime resets it to dummy values for you afterwards. If you want a quick source of zero-initialized memory it may be worthwhile going direct to the OS.
it would be faster if there is a cpu instruction for setting memory cell to zero. but there is none.
very common optimization on Intel architecture, is to use xor a,b
operation where both operands are the same memory location. this removes any need to store value in register and perform move operation. So if library uses this optimization, writing zeros is faster.
I have to correct myself, only if both operands registers, then XOR is used.
Theoretically, it it might be indeed faster.
Firstly, the hardware platform might offer a dedicated CPU instruction(s) that sets memory to zero.
Secondly, setting memory to zero specifically might be supported by OS/hardware as a lazy operation, i.e. the act of actually setting memory to zero doesn't really do anything besides simply marking this memory region for zeroing on the first read. (Of course, something like that is only possible with memory regions managed at OS/hardware level).
The latter actually is one of the reasons the calloc
function exists: on some platforms it can be implemented significantly more efficiently than a mere malloc
followed by a memset
to zero. On such platforms the effect will be thremendously large, not "marginal".
It can be faster on PPC if you align the buffers, since you can just use the dcbz cache instruction. It's not something you should count on as being faster in all cases.
An article that mentions this: http://www.ibm.com/developerworks/power/library/pa-memory/index.html