views:

357

answers:

5

Once again, I find myself with a set of broken assumptions. The article itself is about a 10x performance gain by modifying a proven-optimal algorithm to account for virtual memory:

On a modern multi-issue CPU, running at some gigahertz clock frequency, the worst-case loss is almost 10 million instructions per VM page fault. If you are running with a rotating disk, the number is more like 100 million instructions.

What good is an O(log2(n)) algorithm if those operations cause page faults and slow disk operations? For most relevant datasets an O(n) or even an O(n^2) algorithm, which avoids page faults, will run circles around it.

Are there more such algorithms around? Should we re-examine all those fundamental building blocks of our education? What else do I need to watch out for when writing my own?

Clarification:

The algorithm in question isn't faster than the proven-optimal one because the Big-O notation is flawed or meaningless. It's faster because the proven-optimal algorithm relies on an assumption that is not true in modern hardware/OSes, namely that all memory access is equal and interchangeable.

+3  A: 

One important thing is to realize that the most common usage of big-O notation (to talk about runtime complexity) is only half of the story - there's another half, namely space complexity (that can also be expressed using big-O) which can also be quite relevant.

Generally these days, memory capacity advances have outpaced computing speed advances (for a single core - parallelization can get around this), so less focus is given to space complexity, but it's still a factor that should be kept in mind, especially on machines with more limited memory or if working with very large amounts of data.

Amber
For space complexity, cache sizes are very important. While huge memory is abundant, it's also comparatively slow, so size does still matter. However, __correctness is way more important__ than speed.
sbi
+14  A: 

You only need to re-examine your algorithms when your customers complain about the slowness of your program or it is missing critical deadlines. Otherwise focus on correctness, robustness, readability, and ease of maintenance. Until these items are achieved any performance optimization is a waste of development time.

Page faults and disk operations may be platform specific. Always profile your code to see where the bottlenecks are. Spending time on these areas will produce the most benefits.

If you're interested, along with page faults and slow disk operations, you may want to aware of:

  • Cache hits -- Data Oriented Design
  • Cache hits -- Reducing unnecessary branches / jumps.
  • Cache prediction -- Shrink loops so they fit into the processor's cache.

Again, these items are only after quality has been achieved, customer complaints and a profiler has analyzed your program.

Thomas Matthews
A heart-felt __+1__ from me. If the programmers of all the software I use would only think of __correctness first__! What good does it me if an application runs 20% faster - just to steal me an hour a day due to crashes and other forms of data loss? (Also, it's a well known fact that users will more happily wait 10% longer if those 10% are spend upgrading a progress bar. Which just goes to tell that, generally, speed is overrated.)
sbi
Yes, profile. A good profiler (e.g. Apple's Shark) can even help you spot cache-unfriendly behavior when it actually occurs.
Michael E
Correctness is by far the most important issue. Robustness is almost as important. Readability and ease of maintenance are important, *AND* they all tend to go hand-in-hand, in that readable code tends to be a lot easier to maintain, and correct, robust code tends not to require nearly as much maintenance as the incorrect, fragile code that seems far too often to be Standard Operating Procedure these days.
John R. Strohm
Agree. But this is not an answer to the poster's question.
PeterAllenWebb
+1  A: 

There are no broken assumptions that I see. Big-O notation is a measure of algorithmic complexity on a very, very, simplified idealized computing machine, and ignoring constant terms. Obviously it is not the final word on actual speeds on actual machines.

GregS
+3  A: 

I'll expand on GregS's answer: the difference is between effective complexity and asymptotic complexity. Asymptotic complexity ignores constant factors and is valid only for “large enough” inputs. Oftentimes “large enough” can actually mean “larger than any computer can deal with, now and for a few decades”; this is where the theory (justifiably) gets a bad reputation. Of course there are also cases where “large enough” means n=3 !

A more complex (and thus more accurate) way of looking at this is to first ask “what is the size range of problems you are interested in?” Then you need to measure the efficiency of various algorithms in that size range, to get a feeling for the ‘hidden constants’. Or you can use finer methods of algorithmic asymptotics which actually give you estimates on the constants.

The other thing to look at are ‘transition points’. Of course an algorithm which runs in 2n2 time will be faster than one that runs in 1016nlog(n) times for all n < 1.99 * 1017. So the quadratic algorithm will be the one to choose (unless you are dealing with the sizes of data that CERN worries about). Even sub-exponential terms can bite – 3n3 is a lot better than n3 + 1016n2 for n < 5*1015 (assuming that these are actual complexities).

Jacques Carette
If you want big data, talk to some of the other sciences too. For example, earth science is almost entirely limited by how much data they can store and process. They tend to be less dependent on public funds though, so you hear less about them.
Donal Fellows
+1  A: 

O(n) is only part of the story -- a big part, and frequently the dominant part, but not always the dominant part. Once you get to performance optimization (which should not be done too early in your development), you need to consider all of the resources you are using. You can generalize Amdahl's Law to mean that your execution time will be dominated by the most limited resource. Note that also means that the particular hardware on which you're executing must also be considered. A program that is highly optimized and extremely efficient for a massively parallel computer (e.g., CM or MassPar) would probably not do well on a big vector box (e.g., Cray-2) nor on a high-speed microprocessor. That program might not even do well on a massive array of capable microprocessors (map/reduce style). The different optimizations for the different balances of cache, CPU communications, I/O, CPU speed, memory access, etc. mean different performance.

Back when I spent time working on performance optimizations, we would strive for "balanced" performance over the whole system. A super-fast CPU with a slow I/O system rarely made sense, and so on. O() typically considers only CPU complexity. You may be able to trade off memory space (unrolling loops doesn't make O() sense, but it does frequently help real performance); concern about cache hits vice linear memory layouts vice memory bank hits; virtual vs real memory; tape vs rotating disk vs RAID, and so on. If your performance is dominated by CPU activity, with I/O and memory loafing along, big-O is your primary concern. If your CPU is at 5% and the network is at 100%, maybe you can get away from big-O and work on I/O, caching, etc.

Multi-threading, particularly with multi-cores, makes all of that analysis even more complex. This opens up into very extensive discussion. If you are interested, Google can give you months or years of references.

mpez0
vs. is not short for vice, but for versus (http://www.askoxford.com/concise_oed/vs?view=uk)
Unreason