views:

218

answers:

7

Which is, on average, faster - check the value then, if needed, assign, or simply assign? Or, in C++ terms:

bool b;
if(b)
    b = false;

or

b = false;

Assume that the if() condition is true with 50% probability. The answer will be, most likely, highly architecture dependent - please voice your low-level considerations. Writing always dirties the cache line - right? So by avoiding a write we avoid a cache flush in 0.5 cases. But a smart enough cache might detect a trivial write and not dirty itself. But the unconditional write is always exactly one memory operation, and read-write is, on average, 1.5 operations.

Disclaimer: this is a curiosity question, not a problem I actually face.

+1  A: 

It would definitely be worth profiling this on different architectures to get actual results.

ElectricDialect
+1  A: 

It depends on various things:

  • how predictable the branch is (in the first scenario)
  • whether b is already in a register
  • what architecture you are using
Paul R
+1  A: 

In addition to suggestions to profile, it also really depends on what memory is backing up that write request - if it's a memory-mapped flash device, for example, the write might be extremely costly.

Carl Norum
A: 

If you are doing assignment of pointer, reference or basic value type I personally think the direct assignment will be faster (keen to see the outcome on profiler). In 50% probability environment, you will potential execute a lot more instructions that putting value into register. Assigning struct or class object which trigger assignment operator will be the most expensive. Conditional logic also introduces more instructions and it add to the code complexity metrics

Fadrian Sudaman
+1  A: 

Recently I have been reading papers on very fast compression techniques and guys stressed there the need to avoid if branching to achieve the best performance. The reason for it is the CPU pipelining. Using ifs breaks many of optimizations a CPU can make to execute parts of code in parallel. So, if you had a lot of this operations, it might be faster to use b = false.

pajton
+4  A: 

Branches are expensive on modern CPUs and memory access is expensive on embedded/older CPUs. So the flat just-assign will always be faster unless you have some kinda weird memory that takes longer to write than read(hint: you don't)

It is worse for these reasons specifically:

  • A branching instruction. This may be predicted away by the processor, but it still incurs an overhead possibility.
  • 2 memory accesses instead of 1. Reading and Writing on most forms of memory are the same speed, so why do it twice when you can do it once?
  • More code overhead. this is a micro one, but more instructions must be emitted to do the if statement. So means an extra couple memory reads and more space unnecessarily consumed in the cache.
  • And for the pessimistic, it could mean that the C++ compiler decides to put this variable into a register instead of other more necessary variables..
  • Also, if you assume that b is put into a register. Register reads/writes are very cheap, but they aren't free..
Earlz
Good point about branch (mis)prediction; does it still apply on ARM CPUs where if()'s like this are implemented without any branching?
Seva Alekseyev
It would still be barely slower because of the extra memory read.. Even if you have some magic `read-and-set-false-if-true` instruction, the processor must still read the value from memory in order to test if it is true.. And for the branching on ARM, no, branch misses don't really apply for conditional assignment. in x86 `ccmov` doesn't have a branch I don't believe..
Earlz
So the big downer of R-W - branch misprediction - does not apply on ARM. The big downer of W - cache flush - still does, doesn't it?
Seva Alekseyev
Yes, this may be true.. if `b` isn't moved to a register(hint: a good compiler would optimize and move it to a register usually) so for ARM and other processors that have a memory cache, but no branch prediction it may be slightly slow.. but the second version is much more readable and you instantly know what is intended..
Earlz
+1  A: 

On a modern pipelined processor you need to take this into account:

  • a mispredicted branch costs a lot
  • stores and loads take a long time
  • caches may speed up both reads and writes, but if it's a multi-cache architecture and b is being modified in more than one cache, multiple writes may mean multiple cache evictions and may offset the performance of the cache.

Read with conditional write has at least one memory access and a branch that may mispredict. Assuming the branch is taken 50% of the time, you have 1.5 memory accesses on average, plus the chance of mispredicting.

Unconditional write has exactly one memory access and no branch whatsoever.

Now you need to balance the cost of mispredicting with the cost of a store, which changes depending on how many cache agents you have.

Nathan Fellman
Nice calculation!
pajton