Speed of data writing and retrieval will be influenced by the performance of your local storage mechanism. On modern CPU's there are registers, 2 levels of cache(L1 & L2), DRAM and sometimes disk (via swap). If access patterns and size effectively utilize L1 cache (i.e. small and locally coherent) then once the data is in L1 it only needs to be loaded into the registers to be accessed by the CPU. If the required data is in L2 cache, it must first be loaded into L1 before being loaded into a register for processing. The same goes for DRAM to L2 to L1 to registers. Registers are faster then L1, L1 is faster then L2, and DRAM is just slow.
Herb Sutter gave a talk that addresses these issues several years ago at NWCPP:
http://video.google.com/videoplay?docid=-4714369049736584770#
From a programming perspective, if your data can fit inside a cache line and needs to be repeatedly accessed or written to, you will have higher performance due to fewer cache misses (resulting in the need to fetch from a higher level of cache). This is true for all levels of "cache" be they registers, L1, L2, DRAM, disk or a far off server.