views:

78

answers:

2

In the article "Teach Yourself Programming in Ten Years" Peter Norvig (Director of Research, Google) gives the following approximate timings for various operations on a typical 1GHz PC back in 2001:

  • execute single instruction = 1 nanosec = (1/1,000,000,000) sec
  • fetch word from L1 cache memory = 2 nanosec
  • fetch word from main memory = 10 nanosec
  • fetch word from consecutive disk location = 200 nanosec
  • fetch word from new disk location (seek) = 8,000,000 nanosec = 8 millisec

What would the corresponding timings be for your definition of a typical PC desktop anno 2010?

+3  A: 

Cache and main memory have gotten faster. Disks have higher sequential bandwidth. And SSDs have much lower seek time.

The original list is pretty crummy though, he's mixing latency measures (like seek time) with 1/throughput (you're dreaming if you think you can round-trip to the disk controller, even if the data is already in cache and requires no head movement, in 200ns).

None of the latencies have really changed. The single instruction and L1 latency are actually longer than the figures he gave, but you get multiple instructions working in parallel (pipelining) and several words fetched from cache for the price of one. Similarly for disk transfers, you'll get consecutive blocks delivered in much more rapid succession, but the wait time after issuing a request hasn't changed much, unless you've moved to SSD.

CPU architecture has changed enough, though, that trying to put a single number on any of these is a loss. Different instructions require very different execution times, and data dependencies control the throughput you see. Cache behavior is dominated by the cost of sharing between multi-core CPUs. And so on.

Ben Voigt
+1 for flagging the confusion between latency and throughput
Paul R
+2  A: 

The main thing I would try to obtain from those timings is the differences in scale between them: Memory is around an order of magnitude slower than executing code directly on the CPU, and disk is several orders of magnitude slower than that.

Many developers still think of optimization in terms of CPU time. But with one cache miss, your CPU idles for at least 10 clock cycles, given the above timings. A hard page fault would require 8 million clock cycles. This is why optimizing for memory usage (to reduce page faults) and optimizing your data layout (to reduce cache misses) will often have a higher payback than any optimizations that focus just on code flow.

Michael
+1 I concur ...
High Performance Mark