views:

211

answers:

5

Note that I don't have a "problem" and I'm not looking for "another way to find the big O of my algorithm".

What I'd like to know is if it would be possible write a program to which you'd pass data points that would all be perfs measurements of an algorithm for various input size: (n,time taken to solve problem for n) and that would then determine the complexity of your algorithm.

For example here's what the input could be (it could be much larger, it's really just an example, that's not the point of the question):

    36 000 took 16 ms
   109 000 took 21 ms
   327 000 took 68 ms
   984 000 took 224 ms
 2 952 000 took 760 ms
 8 857 000 took 2305 ms
26 571 000 took 7379 ms
79 716 000 took 23336 ms

Using such kind of data, is it possible to write a program that would tell if we have, say, an O(n), log(n), n log(n) or n! algo?

+4  A: 

I think you could approximate it using regressions, but not get exact results. That's because most algorithms have different performance depending on what the input is (not just the size). So to understand this fully, you would need the source.

Matthew Flaschen
You'd want to try each input size several times with different random data. Also, you could measure the number of low-level computations (e.g. the number of element-comparisons, if you're looking at sorting algorithms) instead of the time.
MatrixFrog
+16  A: 

What you are looking for is Curve fitting. All the simple algorithms for this problem that I know of will try to fit the data points into some sort of polynomial, but I suspect there're those that will be able to differentiate between polynomials and non-polynomials too.

Max Shawabkeh
You can certainly also do e.g. exponential regressions (http://mathbits.com/Mathbits/TISection/Statistics2/exponential.htm)
Matthew Flaschen
+1, Curve fitting seems indeed what I was after. +1 to Matthew too his link is very interesting too.
Webinator
Note that this won't necessarily give you the Big-O performance of an algorithm, which is the asymptotic behaviour as n -> infinity. Sometimes lower-order terms apply at `n` which seems pretty big at the time.
Mike Graham
@Mike: definitely, but it will still give you a decent approximation for the area that you most likely care about, and algorithms that branch in complexity at a far point are often fairly easy to spot.
Max Shawabkeh
+8  A: 

You can use curve fitting (see @Max S.) for determining the formula that describes your data. However, this just half of the story, as there is no way to know if the data describes your algorithm to its full extent.

For instance, your algorithm may present linear behavior for n < 1,000,000,000 and then start behaving quadratically. If you don't have data point where n > 1,000,000,000 then your analysis program will not be able to give you a correct answer.

Thus, to conclude you can do it programmatically, but the results will limited to the data points in your sample. And there is no algorithmic way to determine if the sample sufficiently covers all "interesting" points.

Itay
+2  A: 

Most big-O assumme a idealized machine with infinite memory with uniform time to access, no influence of other applications, etc etc. Especially when you go over thresholds like cache sizes, main memory sizes (paging in/out of the swapfile) can have a significant influence on the performance. So what you determine is how the algorithm is performing in a real world and not it's idealized runtime.

Ritsaert Hornstra
+3  A: 

If you're trying to estimate big-O empirically, you must be very careful to ensure that you're testing on a wide range of instances at each size. Remember that big-O is a worst-case notion. It is not uncommon to find algorithms which perform well in nearly all but a few pathological cases, but it is exactly those pathological cases which determine the big-O time. That is, if you miss the pathological cases in your sampling, you could come away with the idea that an O(2^n) algorithm is O(n) instead.

If what you really need is the big-O time, and not just an idea of average performance, then I recommend proving it analytically. Without doing that, you cannot be certain that you haven't missed some pathological input.

uckelman