views:

1325

answers:

7

For binary search tree type of data structures, I see the Big O notation is typically noted as O(logn). With a lowercase 'l' in log, does this imply log base e (n) as described by the natural logarithm? Sorry for the simple question but I've always had trouble distinguishing between the different implied logarithms.

+2  A: 

Technically the base doesn't matter, but you can generally think of it as base-2.

Tim Sylvester
+4  A: 

It doesn't really matter what base it is, since big-O notation is usually written showing only the asymptotically highest order of n, so constant coefficients will drop away. Since a different logarithm base is equivalent to a constant coefficient, it is superfluous.

That said, I would probably assume log base 2.

Daniel Pryden
-1 That's wrong.
Kinopiko
@Kinopiko: What exactly is wrong about it? More precisely, how is my answer factually different from yours and others here?
Daniel Pryden
Ah, perhaps my mistake in the use of "coefficient". I will edit to clarify.
Daniel Pryden
That was my main issue with your answer. Also, it's a bit unclear what you mean by "they will still have some effect". Some effect on what?
bcat
Your answer discusses the highest order coefficients. What you said is correct as far as it goes, but that is not the reason that the logarithm base is irrelevant. The reason is that the difference between different base logarithms is a constant which is absorbed by the O().
Kinopiko
It still isn't really correct, there isn't any point in talking about highest order of n in this case. Cade Roux's answer is the correct one.
Kinopiko
@Kinopiko: ... and the constant is "absorbed" by the O() *because* we only care about the highest order coefficient of `n`. Right?
Daniel Pryden
Maybe I'm misunderstanding something. Isn't the definition of O(f(n)) the function f() that represents the asymptotic complexity of the algorithm, as a function of `n`? Asymptotic complexity is dominated by the highest-order component of the expression -- hence lower-order (or constant) expressions are superfluous. Or am I misunderstanding something fundamental?
Daniel Pryden
@Daniel: no the constant C is absorbed because, for example, O(100) = O(1).
Kinopiko
@Kinopiko: OK. I think we are saying the same thing. I would say O(100) = O(1) because O(100) = O(100 * 1) = O(C * 1) = O(1). Which is what I meant by constant expressions being superfluous. That is, the *order* of *any* constant is 1.
Daniel Pryden
+40  A: 
Cade Roux
Nice graphics 15
Kinopiko
the graphics are neat but think about the derivation of the O()-polynomial... before O() is applied, only log-base-2 is correct for binary search.
Heath Hunnicutt
@Heath Hunnicutt: No. `log_2 x` differs from `log_b x` by a constant factor `c(b)` for any base `b` independent of `x`.
Jason
Jason - you misread "before O() is applied". See my answer below also. Cade is certainly correct for the O() notation. However I am talking about the pre-O()-notation derivation and my suggestion is that there is a human-understanding reason, as opposed to a mathematical reason, that log_2 x is more clear than log x once you do translate to O().
Heath Hunnicutt
But *why* are you talking about that, when it bears no relation to the question and only serves to confuse?
hobbs
I think Heath is making a useful point, the origin of the growth relative to n is to do with log<sub>2</sub>.
Kinopiko
hobbs: Because that fact is the reason the OP was inspired to inquire. I'm trying to connect his ideas with the answer, so he understands why he had his intuition, why it does not apply to O(), but not to over-apply what he learns here to the derivation part of the analysis. The terse answers which don't address the root cause of the misunderstanding may lead to further misunderstanding. It's bad pedagogy.
Heath Hunnicutt
@Heath Hunnicutt: But the point is that it doesn't matter.
Jason
Jason: no the point is to help the OP understand. And also, on every line of the derivation leading up to the O(), it does matter. Only once you re-express in O() does it not matter. Many other readers get that point. And you do understand I know that O(log N) = O(ln N), right?
Heath Hunnicutt
@Heath Hunnicutt: If you're doing asymptotic analysis, it doesn't matter. That you wait until the last minute to throw some big-O's in doesn't change the fact that I can multiply and divide all my logarithms by some silly constant and change the base at all steps. That is, if I have some analysis that involves `log_2 n`, I can just go in and replace `log_2 n` everywhere by `log_pi 2 * log_2 n / log_pi 2` and then just end up with an analysis that has `log_pi 2 * log_pi n` everywhere. Now my analysis is in terms of `log_pi n`.
Jason
+17  A: 

Once expressed in big-O() notation, both are correct. However, during the derivation of the O() polynomial, in the case of binary search, only log2 is correct. I assume this distinction was the intuitive inspiration for your question to begin with.

Also, as a matter of my opinion, writing O(log2 N) is better for your example, because it better communicates the derivation of the algorithm's run-time.

In big-O() notation, constant factors are removed. Converting from one logarithm base to another involves multiplying by a constant factor.

So O(log N) is equivalent to O(log2 N) due to a constant factor.

However, if you can easily typeset log2 N in your answer, doing so is more pedagogical. In the case of binary tree searching, you are correct that log2 N is introduced during the derivation of the big-O() runtime.

Before expressing the result as big-O() notation, the difference is very important. When deriving the polynomial to be communicated via big-O notation, it would be incorrect for this example to use a logarithm other than log2 N, prior to applying the O()-notation. As soon as the polynomial is used to communicate a worst-case runtime via big-O() notation, it doesn't matter what logarithm is used.

Heath Hunnicutt
But it's very easy to show that `log_2 n` is in `Θ(log_a n)` for any base `a`, so I'm not sure I see how using base 2 is "more correct".
bcat
Sorry, what does "easily typeset" mean?
Kinopiko
In the first sentence of your answer.
bcat
Kinopiko -- For example, if you were writing it down by hand. However,I think O(N log N) is a lot more readable than O(N log-base-2 N), but if I were using TeX I would definitely write $log_2$ even inside the O() just to make the point that the O() derived from a polynomial that had $log_2$ and not $log$. See what I mean?
Heath Hunnicutt
I see bcat. Well you can read the rest of my answer and see what I meant, I hope. You can also see in the same sentence I did write "both are correct." I will change it to say "better" because that's more incisive.
Heath Hunnicutt
I see what you mean, although the phraseology is a little odd.
Kinopiko
bcat -- imagine you were deriving the polynomial for the runtime of binary search, BEFORE you wrote it in the O() or Theta() notations... Only log-base-2 would be correct. Once you express it as O(), you can re-express as any base, I agree. But since the polynomial used for the derivation would be wrong with any other logarithm... We forget that before you express O(), you derive a polynomial expression of the runtime. For things like binary search, we did that polynomial so long ago we have forgotten it and remember only the O() answer.
Heath Hunnicutt
Oh, I see what you're saying. That makes sense.
bcat
+1 I think this is a useful explanation.
Kinopiko
Kinopkio and bcat, thanks for helping it become useful. It was not very well-written at first. :)
Heath Hunnicutt
It is *not* true that during the derivation, log_2 is correct. It *may* be correct (e.g., binary search), but there's no a priori reason to believe that to be the case. There are plenty of problems where log_3 or even log_e show up naturally.
Jesse Beder
But this question is about binary search, so actually it is true.
Kinopiko
By the way, I just wanted to clarify that my above response is because the title of the question is "Is Big O(logn) log base e?", and the answers seem to be approaching this question generally, not specifically with respect to binary search. I'm afraid that your explanation may confuse people. If you make it very clear that you're *only* talking about binary search, then it's fine.
Jesse Beder
Well I added clarity but I sure am hurt that you think my answer might confuse people. Actually, most of the answers here didn't consider the OP's intuition and try to teach him much. I'm not so much wowed by the competition, I'm kind of sad at the low bar for pedagogy.
Heath Hunnicutt
A lot of good answers here, it's very difficult to choose the final answer. The fact that the base does not matter in Big-O notation answered my ambiguity question (everyone's answers). This answer also provided the pre O-notation derivation for the binary search tree which was the original intent of my perhaps misguided question. Thanks for the help.
BuckFilledPlatypus
Good choice 15ch
Kinopiko
Thank you BuckFilledPlatypus, your comment and selection are sincerely appreciated.
Heath Hunnicutt
"during the derivation of the O() polynomial, in the case of binary search, only log2 is correct."-1 for poor mathematics. The definition of x(n) ~ O(f(n)) says that there exists a constant c such that c*(f(n)) < x(n) for all n > n_0. Thus the constant coefficient is completely irrelevant during the analysis.
rlbond
rlbond: that was petty. You can read many other comments here and realize that I understand what you are getting at and do not agree with your pedagogy. To -1 after the OP has validated the answer, which does not disagree with you mathematically, is surely an emotional matter.
Heath Hunnicutt
@Kinopiko - "easily typeset" means it's easier to write log() if you can't easily type a subscript in the editor (like in this one)
Martin Beckett
These comments are getting too long.
Kinopiko
I think your answer is misleading. I gave it a -1 because I felt it deserved -1. Don't take it personally if someone disagrees with your answer, it's still on the top, and it's just a community wiki anyway!And, questions aren't done once the OP accepts the answer. It's still the responsibility of the community to judge the fitness of each answer so other people can access the information.
rlbond
The worst case runtime for a search of a (completely unbalanced) binary tree is O(n), the worst case runtime for a search of a balanced binary tree is O(log n).
Cade Roux
@rlbond -- While your philosophy may be correct, your remark about my math is wrong. The log is base 2 *during the derivation* and it doesn't matter only afterward. My statement is not mathematically incorrect, rather your suggesting that it is so.
Heath Hunnicutt
Since log2(x) is equal to log10(x)/log10(2), you can derive it either way. The log is not strictly base 2 at any point.
rlbond
You realize by during the derivation, I mean derivation of the runtime polynomial, prior to application of O()-notation? When deriving the function to which we apply the O()-notation, if the tree is binary, the logarithm is also.
Heath Hunnicutt
+17  A: 

It doesn't make any difference. logx n = C logy n for any x and y, where C is a constant which the O() makes irrelevant.

O notation only deals with the rate of growth relative to n, so

O(1000,000 n2) = O(0.0000000001 n2) = O(n2),

so here, because

logen = loge2 log2 n,

O(logen) = O(log2 n)

since loge2 is a constant.

Kinopiko
A: 

Yes, when talking about big-O notation, the base does not matter. However, computationally when faced with a real search problem it does matter.

When developing an intuition about tree structures, it's helpful to understand that a binary search tree can be searched in O(n log n) time because that is the height of the tree - that is, in a binary tree with n nodes, the tree depth is O(n log n) (base 2). If each node has three children, the tree can still be searched in O(n log n) time, but with a base 3 logarithm. Computationally, the number of children each node has can have a big impact on performance (see for example: link text)

Enjoy!

Paul

Paul
A: 

Both are correct. Think about this

log2(n)=log(n)/log(2)=O(log(n))
log10(n)=log(n)/log(10)=O(log(n))
logE(n)=log(n)/log(E)=O(log(n))