Worse is better. Is there an example?

+25 A:

quick-sort has worst case time complexity of O(N^2) but it is usually considered better than other sorting algorithms which have O(N log n) time complexity in the worst case.

shoosh 2009-01-23 01:45:21

Dammit. I just typed exactly the same thing, but I choked!

Dave Ray 2009-01-23 01:46:36

This is a good example but naive (unmodified) quicksort version that has O(N**2) time complexity is not widely used.

J.F. Sebastian 2009-01-23 01:53:19

I'm not sure what you're referring to. Even if the complexity doesn't depend on the goodness of the input but instead depends on the goodness of a random number generator, it still doesn't change the worst case analysis.

shoosh 2009-01-23 02:03:18

"if the algorithm selects pivot elements uniformly at random, it has a provably high probability of finishing in O(n log n) time regardless of the characteristics of the input." http://en.wikipedia.org/wiki/Randomized_algorithm#Quicksort

J.F. Sebastian 2009-01-23 02:08:04

As I said, high probability does not affect worst case analysis.

shoosh 2009-01-23 19:07:35

"The worst-case (selection) algorithm can construct a worst-case O(nlogn) quicksort algorithm, by using it to find the median at every step." http://en.wikipedia.org/wiki/Selection_algorithm#Linear_general_selection_algorithm_-_.22Median_of_Medians_algorithm.22

J.F. Sebastian 2009-01-29 03:26:58

Therefore a not naive QuickSort is a worst-case O(n*log(n)). Though I don't know whether the above selection algorithm is actually used to implement QuickSort.

J.F. Sebastian 2009-01-29 03:29:20

Do you know a popular language where a naive QuickSort (O(N**2)) is used to implement stdlib's sorting routine. I might be mistaken but C, C++, Java, Python do not use it.

J.F. Sebastian 2009-01-29 04:05:30

+18 A:

Simplex is an algorithm which has exponential time complexity in the worst case but for any real case it is polynomial. provably polynomial algorithms for linear programming exist but they are very complicated and usually have large constants.

shoosh 2009-01-23 01:49:20

"The current opinion is that the efficiency of good implementations of simplex-based methods and interior point methods is similar for routine applications of linear programming." (from wikipedia). So Simplex's polynomial time alternatives may be more complex but they are as efficient in practice.

J.F. Sebastian 2009-01-23 18:41:37

One of the major advantages to RSM is that it can be hot-started after minor changes to the problem - which is exactly what you need when doing branch-and-bound for integer programming. Interior point methods are not so useful in those cases.

Peter 2009-01-29 03:23:11

+9 A:

Often an algorithm (like quicksort) that can be easily parallelized or randomized will be chosen over competing algorithms that lack these qualities. Furthermore, it is often the case that an approximate solution to a problem is acceptable when an exact algorithm would yield exponential runtimes as in the Travelling Salesman Problem.

Dave Ray 2009-01-23 01:50:03

But parallelized or randomized quicksort has different time complexities. My question is about algorithms that have worse time complexity compared to other known algorithms.

J.F. Sebastian 2009-01-23 02:05:03

+8 A:

This statement can be applied to nearly any parallel algorithm. The reason they were not heavily researched in the early days of computing is because, for a single thread of execution (think uniprocessor), they are indeed slower than their well-known sequential counterparts in terms of asymptotic complexity, constant factors for small n, or both. However, in the context of current and future computing platforms, an algorithm which can make use of a few (think multicore), few hundred (think GPU), or few thousand (think supercomputer) processing elements will beat the pants of the sequential version in wall-clock time, even if the total time/energy spent by all processors is much greater for the parallel version.

Sorts, graph algorithms, and linear algebra techniques alike can be accelerated in terms of wall-clock time by bearing the cost of a little extra bookkeeping, communication, and runtime overhead in order to parallelize.

Matt J 2009-01-23 02:08:55

It is a matter of definition (how to define time complexity) but I would say that parallel algorithms you are talking simply can have a better time complexity, but my question is about algorithms that have *worse* time complexity but nonetheless are better in *all* practical applications.

J.F. Sebastian 2009-01-23 02:25:00

+9 A:

"Worse is Better" can be seen in languages too, for example the ideas behind Perl, Python, Ruby, Php even C# or Java, or whatever language that isn't assembler or C (C++ might fit here or not).

Basically there is always a "perfect" solution, but many times its better to use a "worse" tool/algorithm/language to get results faster, and with less pain. Thats why people use these higher level languages, although they are "worse" from the ideal computer-language point of view, and instead are more human oriented.

Robert Gould 2009-01-23 02:09:36

Though it is related but it is not an answer to my question. The question is about algorithms and their time complexities.

J.F. Sebastian 2009-01-23 02:18:21

yeah, its not directly related to your question, but since the title doesn't limit the question to algorithms, I don't want someone new to the concept to stumble by here later on, and think that "worse is better" only applies to algorithms, when its a more general idea.

Robert Gould 2009-01-23 02:44:08

+2 A:

I've always understood the term 'worse is better' to relate to problems with correct solutions that are very complex where an approximate (or good enough) solution exists that is relatively easier to comprehend.

This makes for easier design, production, and maintenance.

Josh Smeaton 2009-01-23 02:19:58

My question has narrower meaning as in *worse* time complexity but *better* otherwise.

J.F. Sebastian 2009-01-23 02:28:10

A:

After a life time of windows, early Windows Vista forced me to switch to Apple.

Now people think I know nothing about computers, it's great. Two examples for you.

Jas Panesar 2009-01-23 04:13:07

Yeah Vista also made me switch to Apple, but it was like finding an old friend (Unix).

Robert Gould 2009-01-23 05:00:25

In a strange way, it was a homecoming. First to Apple from Elementary school, and second back to BSD from university. With the applications availabe, it is "usable unix".

Jas Panesar 2009-01-23 16:55:51

I think microsoft employees keep negging my post. Don't hate the playa, hate the game yo!

Jas Panesar 2009-05-16 02:36:51

Haha! So many -1's?? Negativity will never bury the truth!

Jas Panesar 2009-08-20 22:22:01

I -1ed it despite not having used Windows since circa 2003.

JUST MY correct OPINION 2010-05-31 05:19:04

We're all entitled to our experiences and opinions :)

Jas Panesar 2010-05-31 18:02:27

+2 A:

There's an O(n) algorithm for selecting the k-th largest element from an unsorted set, but it is rarely used instead of sorting, which is of course O(n logn).

Rafał Dowgird 2009-01-23 08:59:13

I don't see any reason to use sorting for the tasks when `nthlargest` is applicable. It is in stdlib in many languages and it easy to implement if it is not.

J.F. Sebastian 2009-01-23 09:45:00

Is it really in stdlibs? I don't know of a linear time implementation in either C++, Java or Python. Can you provide some pointers?

Rafał Dowgird 2009-01-23 11:49:00

its in STL's algorithms: http://www.sgi.com/tech/stl/nth_element.html and it is very much used.

shoosh 2009-01-23 19:12:12

In Python's stdlib: heapq.nlargest()

J.F. Sebastian 2009-01-29 03:15:28

@J.F. Sebastian: Python's heapq.nlargest() is not O(n), but O(n log(k)).

Rafał Dowgird 2009-01-29 09:14:09

+6 A:

This example would be the answer if there were no computers capable of storing these large collections.

Presumably the size of the collection was 641K.

When working in the technical computing group for BAE SYSTEMS, which looked after structural and aerodynamic code for various aircraft, we had a codebase going back at least 25 years (and a third of the staff had been there that long).

Many of the algorithms were optimised for performance on a 16bit mainframe, rather than for scalability. These optimisations were entirely appropriate for 1970s hardware, but performed poorly on larger datasets on the 32 and 64 bit systems which replaced it. If you're choosing something with worse scalability which works better on the hardware you are currently working on, be aware that this is an optimisation, and it may not apply in the future. At the time those 1970s routines were written, data size we put into them in the 2000s was not practical. Unfortunately, trying to extract a clear algorithm from those codes which then could be implemented to suit modern hardware was not trivial.

Short of boiling the oceans, what counts as 'all practical situations' is often a time dependent variable.

Pete Kirkham 2009-01-23 13:38:49

I would add: Never underestimate a source code longevity.

J.F. Sebastian 2009-01-28 12:18:18

Right. This was not understood in the 1960s and 1970s, because there was practically no source code decades old and still in use.

David Thornley 2009-01-29 20:19:40

+8 A:

Monte Carlo integration is a probabilistic method of calculating definite integrals that has no guarantee of returning the correct answer. Yet, in real-world situations it returns an accurate answer far faster than provably correct methods.

Dour High Arch 2009-01-23 18:14:35

It may depends on the kind of integration region or function but It is the first time I hear that algorithm based on Monte Carlo method has no guarantee to converge.

J.F. Sebastian 2009-01-23 18:35:43

I was going to suggest the possibility of picking the same sample point every iteration, but reviewing the algorithm I see that is not possible. I retract the suggestion it does not converge.

Dour High Arch 2009-01-23 18:50:56

one might say worse is better, as it takes much longer to get to more accuracy, but the error in the answer decreases

simonjpascoe 2009-01-29 05:00:43

+3 A:

Radix sort has time-complexity O(n) for fixed-length inputs, but quicksort is more often used, despite the worse asympotic runtime, because the per-element overhead on Radix sort is typically much higher.

Nick Johnson 2009-01-23 19:25:08

I would say Radix sort just has narrower application domain than Quick sort.

J.F. Sebastian 2009-01-24 00:45:09

Radix sort also places more restrictions on the elements to sort than a simple comparison operation.

2009-01-29 04:52:20

Radix is only applicable in some cases. I also once implemented a hybrid radix/quicksort to deal with a lack of memory--when there isn't enough memory to hold everything radix is a *lot* better.

Loren Pechtel 2009-01-29 05:22:36

+5 A:

Ok, consider solving the traveling sales man problem. The ONLY perfect solution is to test all possible routes. However that becomes impossible with our hardware and time-limits as N increases. So we have thought of many of heuristics.

Which brings us to the answer of your question. Heuristics (worse) are better than brute-force for NP-complete problems. This describes the situation in which "Worse is Better" is always true.

Robert Gould 2009-01-28 03:07:01

My question implies that "worse" means a "worse time-complexity". In your example "worse" means "a possibly incorrect" solution (good-enough vs. no solution at all for large problem sizes).

J.F. Sebastian 2009-01-29 04:27:43

Indeed put that way we are talking about a different "worse"

Robert Gould 2009-01-29 06:54:03

Traveling salesman can be solved in O(n^2 2^n) time, which is really slow, but still much faster than trying every path, which is O((n-1)!).

Derek Ledbetter 2009-01-29 08:31:05

Derek is right. And it has not been /proven/ that O(n^2 * 2^n) is the best perfect solution either.

Matthew Flaschen 2009-05-22 10:52:11

+3 A:

Mergesort versus Quicksort

Quick sort has an average time complexity of O(n log n). It can sort arrays in place, i.e. a space complexity of O(1).

Merge sort also has an average time complexity of O(n log n), however its space complexity is much worse: Θ(n). (there is a special case for linked lists)

Because of the worst case time complexity of quick sort is Θ(n^2) (i.e. all elements fall on the same side of every pivot), and mergesort's worst case is O(n log n), mergesort is the default choice for library implementers.

In this case, I think that the predictability of the mergesort's worst case time complexity trumps quicksorts much lower memory requirements.

Given that it is possible to vastly reduce the likelihood of the worst case of quicksort's time complexity (via random selection of the pivot for example), I think one could argue that mergesort is worse in all but the pathological case of quicksort.

jamesh 2009-01-28 12:33:26

Which libraries prefer mergesort over quicksort?

J.F. Sebastian 2009-01-29 03:18:37

Libraries that have to provide stable sorts

Ellery Newcomer 2009-01-29 04:35:58

Both perl and Java's current implementations use mergesort. .net uses quicksort. Python uses "timsort".

jamesh 2009-01-29 18:26:45

To summarize: mergesort requires more memory but it is stable. BTW, a not naive quicksort implementation is *worse-case* O(n*log(n)). See pivot selection algorithm in @Sasha's answer.

J.F. Sebastian 2009-01-29 22:38:31

The example is good but mergesort is *not* preferable over quicksort in *all* practical situations.

J.F. Sebastian 2009-01-29 22:40:03

+2 A:

Insertion sort despite having O(n²) complexity is faster for small collections (n < 10) than any other sorting algorithm. That's because the nested loop is small and executes fast. Many libraries (including STL) that have implementation of sort method actually using it for small subsets of data to speed things up.

vava 2009-01-29 02:31:01

No doubt there are many examples when a specific solution is preferable for a specific task over a more general solution, but my question about solutions that have the same application domain.

J.F. Sebastian 2009-01-29 03:22:45

+2 A:

When calculating the median of a group of numbers, you can use an algorithm very similar to quicksort. You partition around a number, and all the bigger ones go to one side, and all the smaller ones go the other side. Then you throw away one side and recursively calculate the median of the larger side. This takes O(n^2) in the worst case, but is pretty fast (O(n) with a low constant) in the average case.

You can get guaranteed worst-case O(n) performance, with a constant of about 40. This is called the median of medians algorithm. In practice, you would never use this.

Sasha 2009-01-29 02:40:51

+4 A:

Not quite on the mark, but backtracking-based regular expressions have an exponential worst case versus O(N) for DFA-based regular expressions, yet backtracking-based regular expressions are almost always used rather than DFA-based ones.

EDIT: (JFS)

Regular Expression Matching Can Be Simple And Fast (but is slow in Java, Perl, PHP, Python, Ruby, ...):

The power that backreferences add comes at great cost: in the worst case, the best known implementations require exponential search algorithms.

Regular Expression Engines:

This method (DFA) is really more efficient but it also have important drawbacks:

You have no control on the way the expression return a match, it will always return the longuest-leftmost match, no matter how you have crafted your expression.
There is no backtracking, so all repeating operators are greedy. Atomic grouping and possessive quantifiers are also non-senses.
Lookarounds are impossibles
Capturing and back-references are also impossible
Regex pre-compilation is longer and takes more memory

Ellery Newcomer 2009-01-29 04:37:27

It is a good example. Do you know why is it so?

J.F. Sebastian 2009-01-29 05:06:11

Eh, mostly I'm unacquainted with any compelling reason not to use a DFA-based approach. Maybe the alternative gives you more power or terser syntax, but by the time you need any of that stuff I would argue that you look to something other than regular expressions. Obviously I'm not a Perl hacker.

Ellery Newcomer 2009-01-29 19:10:18

I've added drawbacks of Thompson NFA compared to backtracking regex engines

J.F. Sebastian 2009-02-01 04:01:39

Due to DFA-based engines do not support backreferences they have narrower application domain than backtracking engines. My question is about algorithms with the same power (application domain).

J.F. Sebastian 2009-02-01 04:10:13

+1 A:

Monte carlo integration was already suggested but a more specific example is Monte Carlo pricing in finance is also a suggestion. Here the method is much easier to code and can do more things than some others BUT it is much slower than say, finite difference.

its not practical to do 20dimensional finite difference algorithms, but a 20 dimensional pricing execution is easy to set up.

simonjpascoe 2009-01-29 05:02:39

You are write a 100**20 mesh cells (100 nodes in each directions) is hard to imagine in practice.

J.F. Sebastian 2009-01-29 05:12:47

Another application is a solving of partial differential equations for N-points probability density functions (number of cells is growing as nnodes**(N*ndim) )

J.F. Sebastian 2009-01-29 05:17:02

i think in general 20-d fd algoritm is almost impossible :) I believe the rule of thumb is that FD is good for about 4 dimensions, and after that Montecarlo wins.In very high dimensions, the montecarlo may even be faster!

simonjpascoe 2009-01-29 06:42:25

+4 A:

If I understand the question, you are asking for algorithms that are theoretically better but practically worse in all situations. Therefore, one would not expect them to actually be used unless by mistake.

One possible example is universal memoization. Theoretically, all deterministic function calls should be memoized for all possible inputs. That way complex calculations could be replaced by simple table lookups. For a wide range of problems, this technique productively trades time for storage space. But suppose there were a central repository of the results of all possible inputs for all possible functions used by all of humanity's computers. The first time anyone anywhere did a calculation it would be the last time. All subsequent tries would result in a table lookup.

But there are several reasons I can think of for not doing this:

The memory space required to store all results would likely be impossibly large. It seems likely the number of needed bits would exceed the number of particles in the universe. (But even the task of estimating that number is daunting.)
It would be difficult to construct an efficient algorithm for doing the memoiztion of that huge a problem space.
The cost of communication with the central repository would likely exceed the benefit as the number of clients increase.

I'm sure you can think of other problems.

In fact, this sort of time/space trade-off is incredible common in practice. Ideally, all data would be stored in L1 cache, but because of size limitations you always need to put some data on disk or (horrors!) tape. Advancing technology reduces some of the pain of these trade-offs, but as I suggested above there are limits.

In response to J.F. Sebastian's comment:

Suppose that instead of a universal memoization repository, we consider a factorial repository. And it won't hold the results for all possible inputs. Rather it will be limited to results from 1 to N! Now it's easy to see that any computer that did factorials would benefit from looking up the result rather than doing the calculation. Even for calculating (N+1)! the lookup would be a huge win since that calculation would reduce to N!(N+1).

Now to make this "better" algorithm worse, we could either increase N or increase the number of computers using the repository.

But I'm probably not understanding some subtlety of the question. They way I'm thinking of it, I keep coming up with examples that scale well until they don't.

Jon Ericson 2009-01-29 19:47:09

You are correct about meaning of my answer. But you are wrong about universal repository even theoretically. There is a theorem that states that it is impossible to enumerate all possible results of all possible inputs for all possible functions even if we would have an infinite resources.

J.F. Sebastian 2009-01-29 22:25:16

You are assuming that lookup is O(1) operation but it is not for sufficiently large N. Therefore its time-complexity is not always superior to other algorithms. And there are cases when memoization is used e.g. to compute factorial values less than 2**32 (a size of lookup table in this case is ~13).

J.F. Sebastian 2009-01-30 00:07:52

I still must be missing some subtlety of the question. If the lookup is theoretically worse than the calculation, we simply need to imagine a more complex calculation, no?

Jon Ericson 2009-01-30 08:25:25

Obviously, memoization is useful in many, many situations. It is clearly the best solution for a wide range of problems because the space used is trivial. But when the space used is substantial enough, the calculation wins out. My answer is that memoization, universally applied eventually fails.

Jon Ericson 2009-01-30 08:31:14

+5 A:

Coppersmith–Winograd algorithm for square matrix multiplication. Its time complexity is O(n^2.376) vs. O(n³) of a naive multiplication algorithm or vs. O(n^2.807) for Strassen algorithm.

From the wikipedia article:

However, unlike the Strassen algorithm, it is not used in practice because it only provides an advantage for matrices so large that they cannot be processed by modern hardware (Robinson 2005).

J.F. Sebastian 2009-02-02 12:03:09

A:

Iterative Deepening

When compared to a trivial depth-first search augmented with alpha-beta pruning an iterative deepening search used in conjunction with a poor (or non-existent) branch ordering heuristic would result in many more nodes being scanned. However, when a good branch ordering heuristic is used a significant portion of the tree is eliminated due to the enhanced effect of the alpha-beta pruning. A second advantage unrelated to time or space complexity is that a guess of the solution over the problem domain is established early and that guess gets refined as the search progresses. It is this second advantage that makes it so appealing in many problem domains.

Brian Gideon 2010-05-31 04:16:13

What algorithms have better time-complexity than algorithms based on "iterative deepening" strategy and why are they worse in all practical applications?

J.F. Sebastian 2010-05-31 04:33:57

+1 A:

The Spaghetti sort is better than any other sorting algorithm in that it is O(n) to set up, O(1) to execute and O(n) to extract the sorted data. It accomplishes all of this in O(n) space complexity. (Overall performance: O(n) in time and space both.) Yet, for some strange (obvious) reason, nobody uses it for anything at all, preferring the far inferior O(nlogn) algorithms and their ilk.

JUST MY correct OPINION 2010-05-31 05:03:37

The reason it is not widely used is that it can't be implemented in O(n) on a classical computer. Classical architecture was implied in the question (though not explicitly) due to there is no point in discussing practical applications of an algorithm if a computer that can run it doesn't exist.

J.F. Sebastian 2010-05-31 08:31:16

Give me some seed cash -- say $50K -- and I'll implement the spaghetti sort for you robotically.It will still be less useful (by far!) than the mathematically inferior O(nlogn) algorithms because the constant factor is a tad high (where "tad" is more precisely defined as "six orders of magnitude or so").

JUST MY correct OPINION 2010-05-31 09:27:00

ansaurus

tags:

views:

answers:

Worse is better. Is there an example?

related questions