views:

2742

answers:

16

I am currently learning about Big O Notation running times and amortized times. I understand the notion of O(n) linear time, meaning that the size of the input affects the growth of the algorithm proportionally...and the same goes for, for example, quadratic time O(n2) etc..even algorithms, such as permutation generators, with O(n!) times, that grow by factorials.

For example, the following function is O(n) because the algorithm grows in proportion to its input n:

f(int n) {
  int i;
  for (i = 0; i < n; ++i)
    printf("%d", i);
}

Similarly, if there was a nested loop, the time would be O(n2).

But what exactly is O(log n)? For example, what does it mean to say that the height of a complete binary tree is O(log n)?

I do know (maybe not in great detail) what Logarithm is, in the sense that: log10 100 = 2, but I cannot understand how to identify a function with a logarithmic time.

+23  A: 

Logarithmic running time (O(log n)) essentially means that the running time grows in proportion to the logarithm of the input size - as an example, if 10 items takes at most some amount of time x, and 100 items takes at most, say, 2x, and 10,000 items takes at most 4x, then it's looking like an O(log n) time complexity.

Anon.
+1, but you really should point out that it's log2, not log10.
Adriano Varoli Piazza
log2 or log10 is irrelevant. They only differ by a scale factor, which makes them of the same order, i.e. they still grow at the same rate.
Noldorin
The fun thing about logarithms is that when comparing relative heights, the exact base you use doesn't matter. `log 10,000 / log 100` is 2 regardless of what base you use.
Anon.
To be nitpicky, O(lg n) means that the runtime is *at most* proportional to lg n. What you describe is Theta(lg n).
@rgrig: That is true. I've edited in a few "at mosts" to indicate the upper-bound nature of big-O.
Anon.
@rgrig he described both O and theta: Theta(lg n) implies O(lg n)
klochner
@klochner: "what is an even number?" "1+1 is an even number" "you described the number 2" "no, he described both even numbers and the number 2 because being 2 implies being even". What's your point?
@rgrig - I read your comment as applying to the example rather than the definition. If we're nitpicking, he *defined* Theta (loosely), and *described* a function that is both O and Theta
klochner
@klochner: You win the nitpicking game :) I should have said "define".
@rgrig Thanks :)
klochner
+3  A: 

It simply means that the time needed for this task grows with log(n) (example : 2s for n = 10, 4s for n = 100, ...). Read the Wikipedia articles on Binary Search Algorithm and Big O Notation for more precisions.

Valentin Rocher
That's an exponential running time, not a logarithmic one :)
Anon.
damn you, sunday, for burning my brain cells ! Error corrected.
Valentin Rocher
A: 

The complete binary example is O(ln n) because the search looks like this:

1 2 3 4 5 6 7 8 9 10 11 12

Searching for 4 yields 3 hits: 6, 3 then 4. And log2 12 = 3, which is a good apporximate to how many hits where needed.

Am
+1  A: 

If you plot a logarithmic function on a graphical calculator or something similar, you'll see that it rises really slowly -- even more slowly than a linear function.

This is why algorithms with a logarithmic time complexity are highly sought after: even for really big n (let's say n = 10^8, for example), they perform more than acceptably.

Hadewijch Debaillie
+4  A: 

Divide and conquer algorithms usually have a logn component to the running time. This comes from the repeated halving of the input.

In the case of binary search, every iteration you throw away half of the input. It should be noted that in Big-O notation, log is log base 2.

Edit: As noted, the log base doesn't matter, but when deriving the Big-O performance of an algorithm, the log factor will come from halving, hence why I think of it as base 2.

David Kanarek
Why is it log base 2? In randomized quicksort for example, I don't think it is base 2. As far as i know, the base doesn't matter, as log base a (n) = log2 (n) / log2 (a), so every logarithm is different from another by a constant, and constants are ignored in big-o notation. In fact, writing the base of a log in big-o notation is a mistake in my opinion, as you are writing a constant.
IVlad
Re "log is log base 2": http://stackoverflow.com/questions/1569702/is-big-ologn-log-base-e/1569710#1569710
Paul Baker
Very true that it can be converted to any base and it does not matter, but if you are trying to derive the Big-O performance and you see constant halving, it helps to understand that you wont see log base 10 reflected in the code.
David Kanarek
An aside: In things such as B-trees, where nodes have a fan-out of more than 2 (i.e. "wider" than a binary tree), you'll still see O(logn) growth, because it's still divide-and-conquer, but the base of the log will be related to the fan-out.
Roger Lipscombe
+4  A: 

If you had a function that takes:

1 millisecond to complete if you have 2 elements.
2 milliseconds to complete if you have 4 elements.
3 milliseconds to complete if you have 8 elements.
4 milliseconds to complete if you have 16 elements.
...
n milliseconds to complete if you have 2**n elements.

Then it takes log2(n) time. The Big O notation, loosely speaking, means that the relationship only needs to be true for large n, and that constant factors and smaller terms can be ignored.

Mark Byers
+1  A: 

O(log n) is a bit misleading, more precisely it's O(ld n) where ld is "logarithmus dualis" (logarithm with base 2).

the height of a balanced binary tree is O(ld n) since every node has two (note the "two" as in ld) child nodes. so a tree with n nodes has a height of ld n.

another example is binary search, which has a running time of O(ld n) because with every step you can divide the search space by 2.

stmax
O(log n) is the same order as O(ld n) or O(LN n). They are proportional. I understand that for learning purposes it's easier to use ld.
helios
"more precisely it's O(ld n)" - No, it isn't: all logs are the same order (each differing from the others only by some constant scaling factor, which is ignored/ignorable).
ChrisW
you're right chris, very bad wording. should have said it as helios did. it helps for learning/understanding but finally all logs are the same order.
stmax
+21  A: 

You can think of O(log N) intuitively by saying the time is proportional to the number of digits in N.

If an operation performs constant time work on each digit or bit of an input, the whole operation will take time proportional to the number of digits or bits in the input, not the magnitude of the input; thus, O(log N) rather than O(N).

If an operation makes a series of constant time decisions each of which halves (reduces by a factor of 3, 4, 5..) the size of the input to be considered, the whole will take time proportional to log base 2 (base 3, base 4, base 5...) of the size N of the input, rather than being O(N).

And so on.

moonshadow
Accurate enough and more easily grasped than most explanations, I reckon.
Toon Van Acker
+91  A: 
John Feminella
+1 for the phonebook example
Shaihi
+1 for using very concrete examples with the phone book
Andreas Grech
Looking up a name from a number -- "find the person who has the phone number 468-1701" -- is also `O(n)`.
mobrule
For some of your later examples, you should probably point out specifically that the number of phone books is the same as the number of entries. Your O(n squared) example wasn't immediately obvious to me.
David Thornley
@David Thornley » Ah, gotcha. I thought I'd made that obvious by saying "each resident or business" was getting a phonebook, but I'll edit to highlight this.
John Feminella
I have to wonder if the phonebook example is mine http://stackoverflow.com/questions/487258/plain-english-explanation-of-big-o/487278#487278
cletus
@cletus: Coincidental, I'm afraid. I picked it because phonebooks have a large N, people understand what they are and what they do, and because it's versatile as an example. Plus I got to use robots in my explanation! A win all-around. (Also, it looks like your answer was made before I was even a member on StackOverflow to begin with!)
John Feminella
@John: no worries. Was just curious. :)
cletus
@John Feminella: I like the edit you made, it makes your answer extremely clear. Too bad I can't upvote you again.
David Thornley
"A mistake occurred at the office, and every entry in each of the phone books has an extra "0" at the end of the phone number. Take some white-out and remove each zero." <-- this is not order N squared. N is defined as the size of the input. The size of the input is the number of phone numbers, which is the number of numbers per book times the number of books. That's still a linear time operation.
Billy ONeal
@Billy: In this example, `N` is the number of people in a single book. Because every person in the phone book also gets their own copy of the book, there are `N` _identical_ phone books, each with `N` people in it, which is O(N^2).
John Feminella
@John Feminella: N, with regards to Big Oh, is always the size of the input set. If you define N to be something else, that's fine, but you did not do so above....
Billy ONeal
+6  A: 

O(log n) basically means time goes up linearly while the n goes up exponentially. So if it takes 1 second to compute 10 elements, it will take 2 seconds to compute 100 elements, 3 seconds to compute 1000 elements, and so on.

fastcodejava
+2  A: 

O(log n) refers to a function (or algorithm, or step in an algorithm) working in an amount of time proportional to the logarithm (usually base 2 in most cases, but not always, and in any event this is insignificant by big-O notation*) of the size of the input.

The logarithmic function is the inverse of the exponential function. Put another way, if your input grows exponentially (rather than linearly, as you would normally consider it), your function grows linearly.

O(log n) running times are very common in any sort of divide-and-conquer application, because you are (ideally) cutting the work in half every time. If in each of the division or conquer steps, you are doing constant time work (or work that is not constant-time, but with time growing more slowly than O(log n)), then your entire function is O(log n). It's fairly common to have each step require linear time on the input instead; this will amount to a total time complexity of O(n log n).

The running time complexity of binary search is an example of O(log n). This is because in binary search, you are always ignoring half of your input in each later step by dividing the array in half and only focusing on one half with each step. Each step is constant-time, because in binary search you only need to compare one element with your key in order to figure out what to do next irregardless of how big the array you are considering is at any point. So you do approximately log(n)/log(2) steps.

The running time complexity of merge sort is an example of O(n log n). This is because you are dividing the array in half with each step, resulting in a total of approximately log(n)/log(2) steps. However, in each step you need to perform merge operations on all elements (whether it's one merge operation on two sublists of n/2 elements, or two merge operations on four sublists of n/4 elements, is irrelevant because it adds to having to do this for n elements in each step). Thus, the total complexity is O(n log n).

*Remember that big-O notation, by definition, constants don't matter. Also by the change of base rule for logarithms, the only difference between logarithms of different bases is a constant factor.

Platinum Azure
+2  A: 

But what exactly is O(log n)

What it means precisely is "as n tends towards infinity, the time tends towards a*log(n) where a is a constant scaling factor".

Or actually, it doesn't quite mean that; more likely it means something like "time divided by a*log(n) tends towards 1".

"Tends towards" has the usual mathematical meaning from 'analysis': for example, that "if you pick any arbitrarily small non-zero constant k, then I can find a corresponding value X such that ((time/(a*log(n))) - 1) is less than k for all values of n greater than X."


In lay terms, it means that the equation for time may have some other components: e.g. it may have some constant startup time; but these other components pale towards insignificance for large values of n, and the a*log(n) is the dominating term for large n.

Note that if the equation were, for example ...

time(n) = a + b*log(n) + c*n + d*n*n

... then this would be O(n squared) because, no matter what the values of the constants a, b, c, and non-zero d, the d*n*n term would always dominate over the others for any sufficiently large value of n.

That's what bit O notation means: it means "what is the order of dominant term for any sufficiently large n".

ChrisW
+27  A: 

Many good answers have already been posted to this question, but I believe we really are missing an important one - namely, the illustrated answer.

What does it mean to say that the height of a complete binary tree is O(log n)?

The following drawing depicts a binary tree. Notice how each level contains the double number of nodes compared to the level above (hence binary):

Binary tree

Binary search is an example with complexity O(log n). Let's say that the nodes in the bottom level of the tree in figure 1 represents items in some sorted collection. Binary search is a divide-and-conquer algorithm, and the drawing shows how we will need (at most) 4 comparisons to find the record we are searching for in this 16 item dataset.

Assume we had instead a dataset with 32 elements. Continue the drawing above to find that we will now need 5 comparisons to find what we are search for, as the tree has only grown one level deeper when we multiplied the amount of data. As a results, the complexity of the algorithm can be described as a logarithmic order.

Plotting log(n) on a plain piece of paper, will result in a graph where the rise of the curve decelerates as n increases:

O(log n)

Jørn Schou-Rode
+1, good illustrated answer.
fastcodejava
A: 

Simply put: At each step of your algorithm you can cut the work in half. (Asymptotically equivalent to third, fourth, ...)

Brian R. Bondy
+2  A: 

The best way I've always had to mentally visualize an algorithm that runs in O(log n) is as follows:

If you increase the problem size by a multiplicative amount (i.e. multiply its size by 10), the work is only increased by an additive amount.

Applying this to your binary tree question so you have a good application: if you double the number of nodes in a binary tree, the height only increases by 1 (an additive amount). If you double it again, it still only increased by 1. (Obviously I'm assuming it stays balanced and such). That way, instead of doubling your work when the problem size is multiplied, you're only doing very slightly more work. That's why O(log n) algorithms are awesome.

DivineWolfwood
+2  A: 

http://www.youtube.com/watch?v=5zey8567bcg

I'm a lumberjack and I'm ok. What's log(n) (base b)? It is the number of times you can cut a log of length n repeatedly into b equal parts before reaching a section of size 1.

Chad Brewbaker
I wish I could give you a point for linking to the Python's song hah
Andreas Grech