ansaurus

Question

Read then conditional write vs. write

Answer 1

+1 A:

It would definitely be worth profiling this on different architectures to get actual results.

ElectricDialect 2010-03-03 22:19:54

Answer 2

+1 A:

It depends on various things:

how predictable the branch is (in the first scenario)
whether b is already in a register
what architecture you are using

Paul R 2010-03-03 22:20:47

Answer 3

+1 A:

In addition to suggestions to profile, it also really depends on what memory is backing up that write request - if it's a memory-mapped flash device, for example, the write might be extremely costly.

Carl Norum 2010-03-03 22:25:55

Answer 4

A:

If you are doing assignment of pointer, reference or basic value type I personally think the direct assignment will be faster (keen to see the outcome on profiler). In 50% probability environment, you will potential execute a lot more instructions that putting value into register. Assigning struct or class object which trigger assignment operator will be the most expensive. Conditional logic also introduces more instructions and it add to the code complexity metrics

Fadrian Sudaman 2010-03-03 22:26:42

Answer 5

+1 A:

Recently I have been reading papers on very fast compression techniques and guys stressed there the need to avoid if branching to achieve the best performance. The reason for it is the CPU pipelining. Using ifs breaks many of optimizations a CPU can make to execute parts of code in parallel. So, if you had a lot of this operations, it might be faster to use b = false.

pajton 2010-03-03 22:26:55

Answer 6

+4 A:

Branches are expensive on modern CPUs and memory access is expensive on embedded/older CPUs. So the flat just-assign will always be faster unless you have some kinda weird memory that takes longer to write than read(hint: you don't)

It is worse for these reasons specifically:

A branching instruction. This may be predicted away by the processor, but it still incurs an overhead possibility.
2 memory accesses instead of 1. Reading and Writing on most forms of memory are the same speed, so why do it twice when you can do it once?
More code overhead. this is a micro one, but more instructions must be emitted to do the if statement. So means an extra couple memory reads and more space unnecessarily consumed in the cache.
And for the pessimistic, it could mean that the C++ compiler decides to put this variable into a register instead of other more necessary variables..
Also, if you assume that b is put into a register. Register reads/writes are very cheap, but they aren't free..

Earlz 2010-03-03 22:29:10

Good point about branch (mis)prediction; does it still apply on ARM CPUs where if()'s like this are implemented without any branching?

Seva Alekseyev 2010-03-04 00:47:15

It would still be barely slower because of the extra memory read.. Even if you have some magic `read-and-set-false-if-true` instruction, the processor must still read the value from memory in order to test if it is true.. And for the branching on ARM, no, branch misses don't really apply for conditional assignment. in x86 `ccmov` doesn't have a branch I don't believe..

Earlz 2010-03-04 01:07:21

So the big downer of R-W - branch misprediction - does not apply on ARM. The big downer of W - cache flush - still does, doesn't it?

Seva Alekseyev 2010-03-04 05:29:35

Yes, this may be true.. if `b` isn't moved to a register(hint: a good compiler would optimize and move it to a register usually) so for ARM and other processors that have a memory cache, but no branch prediction it may be slightly slow.. but the second version is much more readable and you instantly know what is intended..

Earlz 2010-03-04 15:06:39

Answer 7

+1 A:

On a modern pipelined processor you need to take this into account:

a mispredicted branch costs a lot
stores and loads take a long time
caches may speed up both reads and writes, but if it's a multi-cache architecture and b is being modified in more than one cache, multiple writes may mean multiple cache evictions and may offset the performance of the cache.

Read with conditional write has at least one memory access and a branch that may mispredict. Assuming the branch is taken 50% of the time, you have 1.5 memory accesses on average, plus the chance of mispredicting.

Unconditional write has exactly one memory access and no branch whatsoever.

Now you need to balance the cost of mispredicting with the cost of a store, which changes depending on how many cache agents you have.

Nathan Fellman 2010-03-03 22:30:31

Nice calculation!

pajton 2010-03-03 22:31:40

ansaurus

tags:

views:

answers:

Read then conditional write vs. write

related questions