views:

1696

answers:

17

What are some examples where Big-O notation[1] fails in practice?

That is to say: when will the Big-O running time of algorithms predict algorithm A to be faster than algorithm B, yet in practice algorithm B is faster when you run it?

Slightly broader: when do theoretical predictions about algorithm performance mismatch observed running times? A non-Big-O prediction might be based on the average/expected number of rotations in a search tree, or the number of comparisons in a sorting algorithm, expressed as a factor times the number of elements.

Clarification:

Despite what some of the answers say, the Big-O notation is meant to predict algorithm performance. That said, it's a flawed tool: it only speaks about asymptotic performance, and it blurs out the constant factors. It does this for a reason: it's meant to predict algorithmic performance independent of which computer you execute the algorithm on.

What I want to know is this: when do the flaws of this tool show themselves? I've found Big-O notation to be reasonably useful, but far from perfect. What are the pitfalls, the edge cases, the gotchas?

An example of what I'm looking for: running Dijkstra's shortest path algorithm with a Fibonacci heap instead of a binary heap, you get O(m + n log n) time versus O((m+n) log n), for n vertices and m edges. You'd expect a speed increase from the Fibonacci heap sooner or later, yet said speed increase never materialized in my experiments.

(Experimental evidence, without proof, suggests that binary heaps operating on uniformly random edge weights spend O(1) time rather than O(log n) time; that's one big gotcha for the experiments. Another one that's a bitch to count is the expected number of calls to DecreaseKey).

[1] Really it isn't the notation that fails, but the concepts the notation stands for, and the theoretical approach to predicting algorithm performance. </anti-pedantry>

On the accepted answer:

I've accepted an answer to highlight the kind of answers I was hoping for. Many different answers which are just as good exist :) What I like about the answer is that it suggests a general rule for when Big-O notation "fails" (when cache misses dominate execution time) which might also increase understanding (in some sense I'm not sure how to best express ATM).

+1  A: 

When your data doesn't fit the model, big-o notation will still work, but you're going to see an overlap from best and worst case scenarios.

Also, some operations are tuned for linear data access vs. random data access, so one algorithm while superior in terms of cycles, might be doggedly slow if the method of calling it changes from design. Similarly, if an algorithm causes page/cache misses due to the way it access memory, Big-O isn't going to going to give an accurate estimate of the cost of running a process.

Apparently, as I've forgotten, also when N is small :)

altCognito
And I'll note, while I'm referring to timing, and slow vs. fast, certainly a lot of the interest in O notation comes from expectations of how fast code will execute, even though Big-O notation doesn't directly refer to speed.
altCognito
+4  A: 

This somewhat depends on what the Big-O is measuring - when it's worst case scenarios, it will usually "fail" in that the runtime performance will be much better than the Big-O suggests. If it's average case, then it may be much worse.

Big-O notation typically "fails" if the input data to the algorithm has some prior information. Often, the Big-O notation refers to the worst case complexity - which will often happen if the data is either completely random or completely non-random.

As an example, if you feed data to an algorithm that's profiled and the big-o is based on randomized data, but your data has a very well-defined structure, your result times may be much faster than expected. On the same token, if you're measuring average complexity, and you feed data that is horribly randomized, the algorithm may perform much worse than expected.

Reed Copsey
From what I was taught, you can never make assumptions about the input. You can only talk about average running time (or expected running time) if you do some randomization inside your algorithm. Therefore any pattern in the input, should not affect the average running time.
Matthijs Wessels
You can't make assumptions about the input, which is why it's typically "worst-case" or "average" running time, but that's my point - often, in the real world, the input already follows some pattern, so the predicted complexity doesn't match up with actual computation time.
Reed Copsey
+26  A: 

When N is small, the constant factor dominates. Looking up an item in an array of five items is probably faster than looking it up in a hash table.

mquander
+17  A: 

Short answer: When n is small. The Traveling Salesman Problem is quickly solved when you only have three destinations (however, finding the smallest number in a list of a trillion elements can last a while, although this is O(n). )

balpha
+8  A: 
  1. For most algorithms there is an "average case" and a "worst case". If your data routinely falls into the "worst case" scenario, it is possible that another algorithm, while theoretically less efficient in the average case, might prove more efficient for your data.

  2. Some algorithms also have best cases that your data can take advantage of. For example, some sorting algorithms have a terrible theoretical efficiency, but are actually very fast if the data is already sorted (or nearly so). Another algorithm, while theoretically faster in the general case, may not take advantage of the fact that the data is already sorted and in practice perform worse.

  3. For very small data sets sometimes an algorithm that has a better theoretical efficiency may actually be less efficient because of a large "k" value.

Eric Petroelje
Note that in case #2, if you specify your problem more precisely ("sorting a mostly sorted list") Big-O gives you the right answer.
Captain Segfault
@Segfault - very true, but the point is more that algorithms have "extreme" cases to their performance, both good extremes and bad extremes. So a typical case big-O doesn't always tell the whole story. Knowing about your data and about the fringe cases of the available algorithms can sometimes lead you to choose a different algorithm than what might be optimal in the typical case.
Eric Petroelje
+13  A: 

the canonical example is Quicksort, which has a worst time of O(n^2), while Heapsort's is O(n logn). in practice however, Quicksort is usually faster then Heapsort. why? two reasons:

  • each iteration in Quicksort is a lot simpler than Heapsort. Even more, it's easily optimized by simple cache strategies.

  • the worst case is very hard to hit.

But IMHO, this doesn't mean 'big O fails' in any way. the first factor (iteration time) is easy to incorporate into your estimates. after all, big O numbers should be multiplied by this almost-constant facto.

the second factor melts away if you get the amortized figures instead of average. They can be harder to estimate, but tell a more complete story

Javier
Note: quicksort takes *expected* O(n log n) time. The use of "expected" is due to the randomness in the partition algorithm. You can also partition around the median, deterministically, in O(n) time, for a worst-case O(n log n) bound on quicksort.
Jonas Kölker
+77  A: 

It fails in exactly one case: When people try to use it for something it's not meant for.

It tells you how an algorithm scales. It does not tell you how fast it is.

Big-O notation doesn't tell you which algorithm will be faster in any specific case. It only tells you that for sufficiently large input, one will be faster than the other.

jalf
+1 Exarctly. Others mention the constant factor, but missed that big-o is ultimately a measure of scalability.
patros
Right. Big-O notation doesn't fail. People fail to understand it and apply it correctly.
Bill the Lizard
I disagree about "what it's meant for." Big-O may be a measure of scalability, but the question people have about their software is not "how scalable is this" but "how fast is this." Big-O, along with your knowledge about the speed of various operations and your understanding of data structures, helps you estimate how fast it is and how you might make it faster.
mquander
"It does not tell you fast it is... it only tells you... one will be fast than the other..." Maybe what you meant was "Given a sufficiently large input, **Big-O can tell you how the # of operations performed will change**"? Indeed this does *generally* relate to speed, otherwise most people would not be interested, but I understand the point you are trying to make here.
altCognito
@mquander: If you need to know whether algorithm A is faster than B in your specific situation, then big-o notation doesn't help you. You need to test and benchmark your implementations then. Big-O notation simply tells you how they're each going to behave as the problem size increases. If the question people have is "how fast is this", then they shouldn't look to big-O notation for the answer.
jalf
Benchmarks and profilers are great tools, but it's also good to have mental tools for reasoning about your code and estimating how you can improve it. The Big-O order of an algorithm is a succinct way of expressing a structural feature of the algorithm that has a big impact on performance in a lot of cases, and it's an important mental tool for making those estimates. It feels like a big nitpick to say that it doesn't tell you "how fast it is."
mquander
I disagree that people just want to know how fast their code is. People also want to know how their code will scale. Big O is not for "how fast," it's for scaling. It's not for small N, it's for bigger and bigger N.
Nosredna
+7  A: 

One area where Big O fails is memory access patterns. Big O only counts operations that need to be performed - it can't keep track if an algorithm results in more cache misses or data that needs to be paged in from disk. For small N, these effects will typically dominate. For instance, a linear search through an array of 100 integers will probably beat out a search through a binary tree of 100 integers due to memory accesses, even though the binary tree will most likely require fewer operations. Each tree node would result in a cache miss, whereas the linear search would mostly hit the cache for each lookup.

Michael
That's at worst a constant factor, and only applies for small n (although a larger n than might otherwise be the case).
David Thornley
@David After all the question was, when does Big-O notation fail to predict performance.
altCognito
Yes. The general answer is "for small enough n". Things like memory access patterns and bookkeeping only affect the size of the n that counts as too small.
David Thornley
@David - That was called out in my answer, "For small N, these effects will typically dominate." I wanted to give a specific case, since there were several answers already that discussed how big O only estimates growth of a function's execution time.
Michael
Actually.. Big-O notation is also used for disk accesses. Normally it is used to specify computational steps, but there is a whole field of research that focuses on decreasing the amount of disc accesses. There they use Big-O to denote the amount of disc accesses. An example of a course on this: http://www.win.tue.nl/~hermanh/teaching/2IL35/
Matthijs Wessels
+7  A: 

Big-O describes the efficiency/complexity of the algorithm and not necessarily the running time of the implementation of a given block of code. This doesn't mean Big-O fails. It just means that it's not meant to predict running time.

Check out the answer to this question for a great definition of Big-O.

Dennis Palmer
+6  A: 

One example (that I'm not an expert on) is that simplex algorithms for linear programming have exponential worst-case complexity on arbitrary inputs, even though they perform well in practice. An interesting solution to this is considering "smoothed complexity", which blends worst-case and average-case performance by looking at small random perturbations of arbitrary inputs.

Spielman and Teng (2004) were able to show that the shadow-vertex simplex algorithm has polynomial smoothed complexity.

othercriteria
+1  A: 

This question is like asking, "When does a person's IQ fail in practice?" It's clear that having a high IQ does not mean you'll be successful in life and having a low IQ does not mean you'll perish. Yet, we measure IQ as a means of assessing potential, even if its not an absolute.

In algorithms, the Big-Oh notation gives you the algorithm's IQ. It doesn't necessarily mean that the algorithm will perform best for your particular situation, but there's some mathematical basis that says this algorithm has some good potential. If Big-Oh notation were enough to measure performance you would see a lot more of it and less runtime testing.

Think of Big-Oh as a range instead of a specific measure of better-or-worse. There's best case scenarios and worst case scenarios and a huge set of scenarios in between. Choose your algorithms by how well they fit within the Big-Oh range, but don't rely on the notation as an absolute for measuring performance.

Kai
+3  A: 
  1. Small N - And for todays computers, 100 is likely too small to worry.
  2. Hidden Multipliers - IE merge vs quick sort.
  3. Pathological Cases - Again, merge vs quick
Sanjaya R
Hidden multipliers only affect how small is small.
David Thornley
+3  A: 

One broad area where Big-Oh notation fails is when the amount of data exceeds the available amount of RAM.

Using sorting as an example, the amount of time it takes to sort is not dominated by the number of comparisons or swaps (of which there are O(n log n) and O(n), respectively, in the optimal case). The amount of time is dominated by the number of disk operations: block writes and block reads.

To better analyze algorithms which handle data in excess of available RAM, the I/O-model was born, where you count the number of disk reads. In that, you consider three parameters:

  • The number of elements, N;
  • The amount of memory (RAM), M (the number of elements that can be in memory); and
  • The size of a disk block, B (the number of elements per block).

Notably absent is the amount of disk space; this is treated as if it were infinite. A typical extra assumption is that M > B2.

Continuing the sorting example, you typically favor merge sort in the I/O case: divide the elements into chunks of size θ(M) and sort them in memory (with, say, quicksort). Then, merge θ(M/B) of them by reading the first block from each chunk into memory, stuff all the elements into a heap, and repeatedly pick the smallest element until you have picked B of them. Write this new merge block out and continue. If you ever deplete one of the blocks you read into memory, read a new block from the same chunk and put it into the heap.

(All expressions should be read as being big θ). You form N/M sorted chunks which you then merge. You merge log (base M/B) of N/M times; each time you read and write all the N/B blocks, so it takes you N/B * (log base M/B of N/M) time.

You can analyze in-memory sorting algorithms (suitably modified to include block reads and block writes) and see that they're much less efficient than the merge sort I've presented.

This knowledge is courtesy of my I/O-algorithms course, by Arge and Brodal (http://daimi.au.dk/~large/ioS08/); I also conducted experiments which validate the theory: heap sort takes "almost infinite" time once you exceed memory. Quick sort becomes unbearably slow, merge sort barely bearably slow, I/O-efficient merge sort performs well (the best of the bunch).

Jonas Kölker
This isn't a failing of Big-O, it's a failure to properly model your execution time.
Adam Rosenfield
+2  A: 

Big O does not say e.g. that algorithm A runs faster than algorithm B. It can say that the time or space used by algorithm A grows at a different rate than algorithm B, when the input grows. However, for any specific input size, big O notation does not say anything about the performance of one algorithm relative to another.

For example, A may be slower per operation, but have a better big-O than B. B is more performant for smaller input, but if the data size increases, there will be some cut-off point where A becomes faster. Big-O in itself does not say anything about where that cut-off point is.

JacquesB
+2  A: 

The general answer is that Big-O allows you to be really sloppy by hiding the constant factors. As mentioned in the question, the use of Fibonacci Heaps is one example. Fibonacci Heaps do have great asymptotic runtimes, but in practice the constants factors are way too large to be useful for the sizes of data sets encountered in real life.

Fibonacci Heaps are often used in proving a good lower bound for asymptotic complexity of graph-related algorithms.

Another similar example is the Coppersmith-Winograd algorithm for matrix multiplication. It is currently the algorithm with the fastest known asymptotic running time for matrix multiplication, O(n2.376). However, its constant factor is far too large to be useful in practice. Like Fibonacci Heaps, it's frequently used as a building block in other algorithms to prove theoretical time bounds.

Adam Rosenfield
+2  A: 

I've seen a few cases where, as the data set grew, the algorithmic complexity became less important than the memory access pattern. Navigating a large data structure with a smart algorithm can, in some cases, cause far more page faults or cache misses, than an algorithm with a worse big-O.

For small n, two algorithms may be comparable. As n increases, the smarter algorithm outperforms. But, at some point, n grows big enough that the system succumbs to memory pressure, in which case the "worse" algorithm may actually perform better because the constants are essentially reset.

This isn't particularly interesting, though. By the time you reach this inversion point, the performance of both algorithms is usually unacceptable, and you have to find a new algorithm that has a friendlier memory access pattern AND a better big-O complexity.

Adrian McCarthy
A: 

The short answer: always on modern hardware when you start using a lot of memory. The textbooks assume memory access is uniform, and it is no longer. You can of course do Big O analysis for a non-uniform access model, but that is somewhat more complex.

The small n cases are obvious but not interesting: fast enough is fast enough.

In practice I've had problems using the standard collections in Delphi, Java, C# and Smalltalk with a few million objects. And with smaller ones where the dominant factor proved to be the hash function or the compare

Stephan Eggermont