What advantages are there to programming for a non-cache-coherent multi-core machine? Cache_coherence has many benefits, but how would one take advantage of the opposite of this feature - an independent cache for each individual core. What programming paradigm and to what particular practical problems would such an architecture be beneficial over a cache-coherent one?
views:
158answers:
5What programming paradigm
Message passing.
and to what particular practical problems would such an architecture be beneficial over a cache-coherent one?
Pattern matching - the input block of memory could very well be "read-only": the "output" result can very well be placed in separate blocks waiting for a "reducer" of some sort.
Of course, this is just an example amongst many I am sure.
Just to make things clear: the principal reasons for going with "non-cache-coherent" architecture are cost & speed (assuming the problems at hand are more efficiently tackled using this architecture).
You can get a bit of extra performance, but you shoul never rely on each processor having different cache values, as you can never know when the cache is flushed.
I'm not an expert; but I don't think it has any advantage over a cache coherent architecture, besides from being simpler to implement. Of course, such simplicity can allow other optimizations that could be prohibitive in a more complex coherent system, making the non-coherent machine faster when carefully programmed.
said that, i concur with jldupont, message passing doesn't need coherency, so it's (almost) the mandatory way to do IPC.
You don't as such take advantage of cache non-coherence. You can't write code which relies on different cores having different views of memory, because a non-coherent cache doesn't guarantee to show different memory to different cores. It just reserves the right to do that.
Cache coherence costs circuits and time. Non-coherent caches are therefore cheaper (and cooler, perhaps?) and faster. Memory access might be faster in cycles, or might be the same best-case speed but with fewer stalls due to cache synchronisation and especially false sharing.
So it's not so much extra things you do to take advantage of non-coherence, it's the things that you don't have to do because you've dropped the disadvantages of coherence - you don't have to redesign your parallel code because it's spending all its time sitting around waiting for the result of a memory store from another core.
The downside on a non-coherent cache architecture at first appears to be that find yourself using additional synchronisation that's provided automatically by coherent caches. No double-checked locking for you. Then you realise that in effect, the coherent-cache architectures do this synchronisation (albeit in a super-fast hardware-implemented form) for every single memory access, and block if the cache line is dirty, whether you need it to or not. That cheers me right up :-)
You could think of the Cell SPE local memory as a sort of cache. It isn't cache really since it isn't automatic at all, but the speed is the same and it isn't coherent.
It has big speed advantages because the hardware does not need to spend any time synchronizing the cache line states between cores.
In a Cell, the programmer must do the synchronization manually by writing code to copy SPE local memory back and forth. So a disadvantage is much greater program complexity.