Is Interlocked.Increment(ref x)
faster or slower than x++
for ints and longs on various platforms?
views:
1728answers:
5It will always be slower because it has to perform a CPU bus lock vs just updating a register. However modern CPUs achieve near register performance so it's negligible even in real-time processing.
It's slower. However, it's the most performant general way I know of for achieving thread safety on scalar variables.
Think about it for a moment, and you'll realize an Increment
call cannot be any faster than a simple application of the increment operator. If it were, then the compiler's implementation of the increment operator would call Increment
internally, and they'd perform the same.
But, as you can see by testing it for yourself, they don't perform the same.
The two options have different purposes. Use the increment operator generally. Use Increment
when you need the operation to be atomic and you're sure all other users of that variable are also using interlocked operations. (If they're not all cooperating, then it doesn't really help.)
It is slower since it forces the action to occur atomically and it acts as a memory barrier, eliminating the processor's ability to re-order memory accesses around the instruction.
You should be using Interlocked.Increment when you want the action to be atomic on state that can be shared between threads - it's not intended to be a full replacement for x++.
In our experience the InterlockedIncrement() et al on Windows are quite significant impacts. In one sample case we were able to eliminate the interlock and use ++/-- instead. This alone reduced run time from 140 seconds to 110 seconds. My analysis is that the interlock forces a memory roundtrip (otherwise how could other cores see it?). An L1 cache read/write is around 10 clock cycles, but a memory read/write more like 100.
In this sample case, I estimated the number of increment/decrement operations at about 1 billion. So on a 2Ghz CPU this is something like 5 seconds for the ++/--, and 50 seconds for the interlock. Spread the difference across several threads, and its close to 30 seconds.