views:

271

answers:

4

I found some references about big O notation, but as far as I can understand algorithm complexity is a function of size of input data.

For example, if complexity of bubble sort is O(n^2), n is the size of input array. Right?

But, how can I determinate complexity of algorithm that has fixed input size and depends of values of input. For example, finding Greatest Common Divisor (GCD) would look like this:

def GCD(x, y):
    while x != y:
        if x < y:
            x, y = y - x, x
        else:
            x, y = x - y, x
    return x

What is complexity of this algorithm? And how is it determined?

Edit: Changed name of the function and corrected name of algorithm. ShreevatsaR, thanks for pointing it out.

+1  A: 

the input size is the sum of the sizes of the numbers x and y (e.g. how many digits are in the number)

newacct
That's possible -- but extremely, extremely unlikely in this case. If you use the sizes of `x` and `y`, then the order-of estimate is O(10<sup>n</sup>). If you use the value of `x` and `y`, then the order-of estimate is O(n).
Chip Uni
+1  A: 

Big O complexity is the worst case asymptotic runtime behavior. It's not necessarily dependent on the input size (quantity of inputs) to a particular algorithm - though that is often the case. In other words, it's the limiting function that describes the runtime as the inputs are taken to infinity.

In your case, if x or y are unbounded, so is the asymptotic runtime. If not, think about the run time if x = 1, and y = Int32.Max?

Paul
+2  A: 

Big-O notation should specify what is being measured.

For example, Big-O notation for sort algorithms usually measures the number of comparisons.

Your GCD example can be measured comparing the values of x and y against the number of instructions executed. Let's look more closely:

def GCD(x, y):
    while x != y:               # 1
        if x < y:               # 2
            x, y = y - x, x     # 3
        else:                   # 4
            x, y = x - y, x     # 5
    return x                    # 6

Work from the inside to the outside.

No matter the values of x and y, steps 3 and 5 will always take one instruction. Therefore, the if statement of step 2 will always take two instructions.

The harder question is step 1. With every iteration, either x or y will be lowered by the smaller value. Assume that x > y. One of two things will happen:

  • If it started that x % y == 0, then the loop will be executed (x / y) - 1 times and the program will stop.

  • Otherwise, x will be reduced (x / y) times before it's smaller than y and the program will continue.

You can easily measure the number of instructions for any given x and y. You can easily show that, for a given number z, you will never need more than z - 1 subtractions -- or 2 * (z-1) instructions. (Think about gcd(z, 1).)

Chip Uni
So is the behaviour `O(max(x,y))` then?
Peter Kofler
Yes. It's `O(max(x,y))`.
Chip Uni
+7  A: 

People play a bit fast and loose with big-O notation. In the case of GCD, they generally do it in 2 ways:

1) You're right, algorithmic complexity, and hence big-O notation, should be stated in terms of the size in bits of the input, not in terms of the values of the input. That's how P, NP, and so on are defined. Assuming binary input and arbitrarily-large numbers (like a BigNum representation), and N the number of bits of input, your GCD requires at most 2^N subtractions, each of which requires time O(N) to run over each digit of the numbers being subtracted. So it's O(N*2^N). GCD can of course be done much faster if you use division rather than subtraction: O(N^2).

So, when we say that testing primality was proved in 2002 to be done in polynomial time, that's the technical definition of complexity, and we mean polynomial in the number of digits (which is the tricky part), not polynomial in the input number itself (which is trivially easy to do in "sub-linear time" using trial division).

But in practice, for algorithms which take a fixed number of integer inputs, it's more convenient to talk about complexity as though N were the input itself, not the size of the input. It's not harmful as long as you're clear what you mean in cases that might be ambiguous.

2) In practice, integer inputs are often fixed-size, 32 bit or whatever, and operations on them such as addition, multiplication and division are O(1) time. We use these facts selectively in our order analysis. Technically if your GCD program only accepts inputs up to (2^32-1), then it is O(1). It has a fixed upper bound on its running time. End of analysis.

Although technically correct that's not a very useful analysis. Almost anything you do on a real computer is O(1) on the same basis, that the size of the problem is constrained by the hardware.

It's usually more convenient to accept that addition is O(1) because the numbers are fixed-size, but ignore that GCD is also O(1), pretend that its behaviour in the range [1, 2^32) extends to infinity, and analyse it on that basis. Then with N the max of the two inputs, it comes out O(N): O(N) subtractions, each taking constant time.

Again, this is not ambiguous once you know what the terms of reference are, but beware of incorrectly comparing the first analysis I gave of Euclid's algorithm with division, O(N^2), against this analysis of the algorithm with subtraction, O(N). N is not the same in each, and subtraction is not faster ;-)

Steve Jessop