views:

57

answers:

2

Taking Peter Norvig's advice, I am pondering on the question:

How much time does it take to fetch one word from memory, with and without a cache miss?

(Assume standard hardware and architecture. To simplify calculations assume 1Ghz clock)

+4  A: 

Seems like Norvig answers this himself:

execute typical instruction         1/1,000,000,000 sec = 1 nanosec
fetch from L1 cache memory          0.5 nanosec
branch misprediction                5 nanosec
fetch from L2 cache memory          7 nanosec
Mutex lock/unlock                   25 nanosec
fetch from main memory              100 nanosec
send 2K bytes over 1Gbps network    20,000 nanosec
read 1MB sequentially from memory   250,000 nanosec
fetch from new disk location (seek) 8,000,000 nanosec
read 1MB sequentially from disk     20,000,000 nanosec
send packet US to Europe and back   150 milliseconds = 150,000,000 nanosec 

The part where it says "execute typical instruction" = 1 ns implies a 1 GHz CPU (assuming efficient pipelining, of course).

I don't know where he takes this information, but I trust Peter Norvig to be reliable :-)

Eli Bendersky
What about some justification for the answer? The hard numbers are, more or less, known.
Yuval A
@Yuval - good question. But you didn't specify in your original question that you're interested in the "why", only the "what". Also, exactly what kind of justification are you looking for here? Explain *why* main memory takes 100 ns? As in explaining all the logical circuits playing part?
Eli Bendersky
I find the mutex vs. main-memory-fetch interesting and surprising. I'd have guessed that the which-is-at-least-four-times-slower would be the other way around. I'd have thought a typical Mutex-lock would *require* a main-memory fetch. Obviously I'm wrong - I'll have to adjust some of my views about multitasking overheads, I guess.
Steve314
+1  A: 

Fair summary here, with some imprecision. When written (2+ years ago) and for a mid-range PC of the time, it estimated: memory access, 60ns; L1 cache, 10ns; L2 cache, 20-30ns (no estimates for L3 cache access times). It all varies a lot of course, depending on contention and access patterns (since cache layers are typically filled "by lines" from slower memory, if you access address X then address X+1 the second access may be a bit faster as the cache line filling was started by the first access).

And, of course, a high-end, well-tuned server will be much faster (relative differences between such machines in memory access latency are typically much larger than ones in "raw" CPU speeds).

Alex Martelli