views:

302

answers:

7

I'm not very expert on how processors work, but one might imagine that it was easier to set chunks of memory to zero than non zero values and so it may be marginally faster.

A: 

If you can do that with the help of the virtual memory system, you can get zeroed (non-allocated) pages faster than non-zero pages. Such optimization is normally not used in C++ applications (e.g. the standard library implementation), so do not expect to get any difference between allocating a std::vector filled with zero versus some other value.

Tronic
+8  A: 

I think the only difference would be in setting up the register that has the value to store to memory. Some processors have a register that's fixed at zero (ia64 for example). Even so, whatever minuscule overhead there might be for setting up a register will be monstrously dwarfed by the writing to memory.

As far as the time to actually write to the memory - that'll be clocked the same on all architectures I'm familiar with.

Michael Burr
+1  A: 

I have no idea, because of the number of factors involved, but the way to find out is to code both ways and benchmark them.

It's worth noting that the Windows VirtualAlloc function initializes newly-allocated memory to zero, although the Microsoft debug C++ runtime resets it to dummy values for you afterwards. If you want a quick source of zero-initialized memory it may be worthwhile going direct to the OS.

Tim Robinson
I don't know which Windows allocation function you are talking about, but if it is GlobalAlloc, it does not allocate zero-set memory by default, though it can be requested to.
anon
@Neil: VirtualAlloc (MEM_COMMIT) allocates 0 filled memory.
Remus Rusanu
Are you sure for the Windows initialization thing ? It would be quite power and time consuming to set a large part of memory whether the program requires it or not, wouldn't it ?
Seb
btw, if you benchmark it, you should consider also the impact of existing value in the memory. Maybe overwriting 0s is more expensive than overwriting 0xFFs? ;) jk
Remus Rusanu
@Seb: a process that requests a new page will get it from the Zero pages list. Otherwise a process might get a page with content left from another process, which is a huge security violation. The kernel will be zero-ing out free pages when the zero pages list gets low. This is all described din Solomon and Russinovich book.
Remus Rusanu
@Remus The fact it is done by the kernel does not automatically make it faster, though.
anon
@Neil The fact the kernel already does it makes re-zeroing memory from VirtualAlloc redundant.
Tim Robinson
@Neil: My comment is only about whether zero-filled allocation exists or not. I did not make any comment on the OP topic, whether is faster or slower than a non-zero pattern fill.
Remus Rusanu
When `VirtualAlloc` allocatec zero-filled memory, it does it by performing *lazy* zeroing, as I described in my answer. The physical zeroing takes place on the per-page basis on the first read-access (if the first access is indeed *read* -access) and never takes place if the first access is write-access. This is indeed much faster in general than unconditionally steamrolling over the memory region with `memset`.
AndreyT
I could be oversimplyfying things though. Since it is done of per-page basis (if this is so), then, of course, the only case when we can avoid the physical zeroing entirely is when the first access is a *full-page* write-access.
AndreyT
@AndreyT: the first time a page fault occurs on newly allocated page the system will get a page from the zero list and assign it to the process. You are describing as if the page is allocated and marked as protected, and then zero-ed on access (ie. lazy zeroing), which is inaccurate.
Remus Rusanu
@Remus Rusanu: I believe I saw it described in this very way (as lazy zeroing) in Jeffrey Richter's "Advanced Windows". This was a while ago though, so it could be outdated.
AndreyT
Remus Rusanu
A: 

it would be faster if there is a cpu instruction for setting memory cell to zero. but there is none.

Andrey
A: 

very common optimization on Intel architecture, is to use xor a,b operation where both operands are the same memory location. this removes any need to store value in register and perform move operation. So if library uses this optimization, writing zeros is faster.

I have to correct myself, only if both operands registers, then XOR is used.

aaa
Ummh, wouldn't that require a memory read, followed by the xor operation, then followed by a memory write? That would be very slow.
Tronic
@Tronic, I think you are right, reading assembly, it seems only when both operands are XMM registers, then it does use xor
aaa
+3  A: 

Theoretically, it it might be indeed faster.

Firstly, the hardware platform might offer a dedicated CPU instruction(s) that sets memory to zero.

Secondly, setting memory to zero specifically might be supported by OS/hardware as a lazy operation, i.e. the act of actually setting memory to zero doesn't really do anything besides simply marking this memory region for zeroing on the first read. (Of course, something like that is only possible with memory regions managed at OS/hardware level).

The latter actually is one of the reasons the calloc function exists: on some platforms it can be implemented significantly more efficiently than a mere malloc followed by a memset to zero. On such platforms the effect will be thremendously large, not "marginal".

AndreyT
+1.And some OSes maintain a pool of zeroed pages, which they can zero 'when there is free time'.
tony
+2  A: 

It can be faster on PPC if you align the buffers, since you can just use the dcbz cache instruction. It's not something you should count on as being faster in all cases.

An article that mentions this: http://www.ibm.com/developerworks/power/library/pa-memory/index.html

Dan Olson
Thanks for that link. I've never seen a cache-specific instruction before.
Mark Ransom