views:

546

answers:

13

If I say to you:

"I am thinking of a number between 0 and n, and I will tell you if your guess is high or low", then you will immediately reach for binary search.

What if I remove the upper bound? i.e. I am thinking of a positive integer, and you need to guess it.

One possible method would be for you to guess 2, 4, 8, ..., until you guess 2**k for some k and I say "lower". Then you can apply binary search.

Is there a quicker method?

EDIT:

Clearly, any solution is going to take time proportional to the size of the target number. If I chuck Graham's number through the Ackermann function, we'll be waiting a while whatever strategy you pursue.

I could offer this algorithm too: Guess each integer in turn, starting from 1.

It's guaranteed to finish in a finite amount of time, but yet it's clearly much worse than my "powers of 2" strategy. If I can find a worse algorithm (and know that it is worse), then maybe I could find a better one?

For example, instead of powers of 2, maybe I can use powers of 10. Then I find the upper bound in log_10(n) steps, instead of log_2(n) steps. But I have to then search a bigger space. Say k = ceil(log_10(n)). Then I need log_2(10**k - 10**(k-1)) steps for my binary search, which I guess is about 10+log_2(k). For powers of 2, I have roughly log_2(log_2(n)) steps for my search phase. Which wins?

What if I search upwards using n**n? Or some other sequence? Does the prize go to whoever can find the sequence that grows the fastest? Is this a problem with an answer?

Thank you for your thoughts. And my apologies to those of you suggesting I start at MAX_INT or 2**32-1, since I'm clearly drifting away from the bounds of practicality here.

FINAL EDIT:

Hi all,

Thank you for your responses. I accepted the answer by Norman Ramsey (and commenter onebyone) for what I understood to be the following argument: for a target number n, any strategy must be capable of distinguishing between (at least) the numbers from 0..n, which means you need (at least) O(log(n)) comparisons.

However seveal of you also pointed out that the problem is not well-defined in the first place, because it's not possible to pick a "random positive integer" under the uniform probability distribution (or, rather, a uniform probability distribution cannot exist over an infinite set). And once I give you a nonuniform distribution, you can split it in half and apply binary search as normal.

This is a problem that I've often pondered as I walk around, so I'm pleased to have two conclusive answers for it.

+14  A: 

If there truly is no upper bound, and all numbers all the way to infinity are equally likely, then there is no optimum way to do this. For any finite guess G, the probability that the number is lower than G is zero and the probability that it is higher is 1 - so there is no finite guess that has an expectation of being higher than the number.

RESPONSE TO JOHN'S EDIT:

By the same reasoning that powers of 10 are expected to be better than powers of 2 (there's only a finite number of possible Ns for which powers of 2 are better, and an infinite number where powers of 10 are better), powers of 20 can be shown to be better than powers of 10.

So basically, yes, the prize goes to fastest-growing sequence (and for the same sequence, the highest starting point) - for any given sequence, it can be shown that a faster growing sequence will win in infinitely more cases. And since for any sequence you name, I can name one that grows faster, and for any integer you name, I can name one higher, there's no answer that can't be bettered. (And every algorithm that will eventually give the correct answer has an expected number of guesses that is infinite, anyway).

caf
Well, if I use powers of 2 then it will take ceil(log2(n)) guesses to find the upper bound, then the same number to find the actual number. Or i could use, say, powers of 10, in which case I'll find an upper bound faster, but have a bigger range to search. Or I could guess n**n, which grows very fast but would leave me a much bigger range to search eventually. Certainly you can't get an upper bound in O(1) guesses, but that doesn't mean all strategies are equally bad.
John Fouhy
The problem is that if `n` can take on any integer in the range `[0, inf)` with equal probability, then the expected value of both `log2(n)` and `log10(n)` is also infinity - meaning that the expected number of guesses to find the upper bound is infinite in both cases, so they really are equally bad (and "as bad as possible", which also happens to be "as good as possible" :)
caf
"all numbers all the way to infinity are equally likely" - that is not possible (as a simple result in probability theory), and hence not the case.
Steve Jessop
Nonetheless, it is what the question (implicitly) specified, so I chose to treat it as a Gedankenexperiment (in the same vein as something like Newcomb's Problem).
caf
There is no probability distribution on the integers that satisfies your claim that for all guesses G, the probability that the target number is lower than G is zero.
I. J. Kennedy
A: 

I'd probably start my guessing with Graham's Number.

Cinder6
In the grand scheme of things, this number is miniscule.
Jason
Oh, totally, but it's still bigger than any number (literal) anybody is likely to just name off the top of their head, unless they try to get clever and say "g64 + 1".
Cinder6
How about A(g64, g64), as per XKCD?
Cinder6
A: 

The practical answer within a computing context would be to start with whatever is the highest number that can (realistically) be represented by the type you are using. In case of some BigInt type you'd probably want to make a judgement call about what is realistic... obviously ultimately the bound in that case is the available memory... but performance-wise something smaller may be more realistic.

jerryjvl
+1  A: 

My main refinement is that I'd start with a higher first guess instead of 2, around the average of what I'd expect them to choose. Starting with 64 would save 5 guesses vs starting with 2 when the number's over 64, at the cost of 1-5 more when it's less. 2 makes sense if you expect the answer to be around 1 or 2 half the time. You could even keep a memory of past answers to decide the best first guess. Another improvement could be to try negatives when they say "lower" on 0.

David
As @caf pointed out, it's (almost) irrelevant what your first guess is if the upper limit is infinity. ANY finite number as (almost) zero chance of being larger than the actual number. But that's a mathematical problem and answer, not a computing one.
Eric J.
@Eric, human factors come into play here as well. Ask any average Joe on the street to pick an unbounded number and 80% of them would probably *still* choose "7" :-)
paxdiablo
And the other 20% would decompose in the following manner: 10% will choose 13, 5% would choose 666, 4% would choose 10 and 1% would choose 42
Vinko Vrsalovic
@Pax: Actually, the choice would be seventeen (http://tinyurl.com/lzsurn).
Jason
+2  A: 

Use binary search starting with MAX_INT/2, where MAX_INT is the biggest number your platform can handle.

No point in pretending we can actually have infinite possibilities.

UPDATE: Given that you insist on entering the realms of infinity, I'll just vote to close your question as not programming related :-)

Vinko Vrsalovic
The possibilities might not be infinite, but if the size of an integer in the given implementation is only bounded by the available memory starting with MAX_INT/2 will be very inefficient because you will take up almost all of the memory with the first guess and calculating with such big numbers will be very costly. So in that case it would be better to start with a low number and work your way up.
sepp2k
If the number chosen is truly random (and not chosen by a human) you'd get half of the time a number greater than MAX_INT/2, so starting from a lower number would only waste more time and still consume that much memory and computation half of the runs, while incurring only in a big startup cost for the other half (as it'll be reduced in size quickly).
Vinko Vrsalovic
+1  A: 

If this is guessing the upper bound of a number being generated by a computer, I'd start with 2**[number of bits/2], then scale up or down by powers of two. This, at least, gets you the closest to the possible values in the least number of jumps.

However, if this is a purely mathematical number, you can start with any value, since you have an infinite range of values, so your approach would be fine.

Reed Copsey
+3  A: 

Mathematically speaking:

You cannot ever correctly find this integer. In fact, strictly speaking, the statement "pick any positive integer" is meaningless as it cannot be done: although you as a person may believe you can do it, you are actually picking from a bounded set - you are merely unconscious of the bounds.

Computationally speaking:

Computationally, we never deal with infinites, as we would have no way of storing or checking against any number larger than, say, the theoretical maximum number of electrons in the universe. As such, if you can estimate a maximum based on the number of bits used in a register on the device in question, you can carry out a binary search.

Mala
integers are only countably infinite, so by induction you can always pick one. The axiom of choice is only relevant for non-countably infinite sets (such as reals)
Chris Dodd
Ah right, thanks for pointing that out :)
Mala
@Chris Dodd: No. The axiom of choice is relevant to products of nonempty sets (the axiom states that such a product is not empty).
Jason
A: 

Your starting point should be the largest number you can think of plus 1.

There is no 'efficient search' for a number in an infinite range.

EDIT: Just to clarify, for any number you can think of there are still infinitely more numbers that are 'greater' than your number, compared to a finite collection of numbers that are 'less' than your number. Therefore, assuming the chosen number is randomly selected from all positive numbers, you have zero | (approaching zero) chance of being 'above' the chosen number.

Kirk Broadhurst
I'm having trouble finding the largest number I can think of plus 1. :-)
Adam Liss
Obviously any algorithm will take time proportional to the target number. I could give you this algorithm: "Guess each integer in turn until you find it". This is clearly worse than the algorithm in my post. If I can supply a worse algorithm, can I not find a better one?
John Fouhy
I disagree. The suggested algorithm (guess each integer in turn) is not 'clearly worse' than the binary algorithm, any more than 1 x 0 is smaller than 100 x 0. With an unbounded range, the chance of finding the number is zero.
Kirk Broadhurst
+3  A: 

Worst case, you can find it in time logarithmic in the size of the answer using exactly the methods you describe. You might use Ackermann's function to find an upper bound faster than logarithmic time, but then the binary search between the number guessed and the previous guess will require time logarithmic in the size of the interval, which (if guesses grow very quickly) is close to logarithmic in the size of the answer.

It would be interesting to try to prove that there is no faster algorithm (e.g., O(log log n)), but I have no idea how to do it.

Norman Ramsey
An O(log log n) algorithm would have O(log log n) bits of input, and would have to be capable of outputting any of n possibilities. QED. I think. Basically the same argument as you use to prove that a general sort is at best O(n log n) comparisons.
Steve Jessop
+1  A: 

The standard default assumption of a uniform distribution for all positive integers doesn't lead to a solution, so you should start by defining the probability distribution of the numbers to guess.

starblue
+2  A: 

Binary search can be generalized: each time set of possible choices should be divided into to subsets of probability 0.5. In this case it's still applicable to infinite sets, but still requires knowledge about distribution (for finite sets this requirement is forgotten quite often)...

maxim1000
+5  A: 

People (who have never studied probability) tend to think that "pick a number from 1 to N" means "with equal probability of each", and they act according to their intuitive understanding of probability.

Then when you say "pick any positive integer", they still think it means "with equal probability of each".

This is of course impossible - there exists no discrete probability distribution with domain the positive integers, where p(n) == p(m) for all n, m.

So, the person picking the number must have used some other probability distribution. If you know anything at all about that distribution, then you must base your guessing scheme on that knowledge in order to have the "fastest" solution.

The only way to calculate how "fast" a given guessing scheme is, is to calculate its expected number of guesses to find the answer. You can only do this by assuming a probability distribution for the target number. For example, if they have picked n with probability (1/2) ^ n, then I think your best guessing scheme is "1", "2", "3",... (average 2 guesses). I haven't proved it, though, maybe it's some other sequence of guesses. Certainly the guesses should start small and grow slowly. If they have picked 4 with probability 1 and all other numbers with probability 0, then your best guessing scheme is "4" (average 1 guess). If they have picked a number from 1 to a trillion with uniform distribution, then you should binary search (average about 40 guesses).

I say the only way to define "fast" - you could look at worst case. You have to assume a bound on the target, to prevent all schemes having the exact same speed, namely "no bound on the worst case". But you don't have to assume a distribution, and the answer for the "fastest" algorithm under this definition is obvious - binary search starting at the bound you selected. So I'm not sure this definition is terribly interesting...

In practice, you don't know the distribution, but can make a few educated guesses based on the fact that the picker is a human being, and what numbers humans are capable of conceiving. As someone says, if the number they picked is the Ackermann function for Graham's number, then you're probably in trouble. But if you know that they are capable of representing their chosen number in digits, then that actually puts an upper limit on the number they could have chosen. But it still depends what techniques they might have used to generate and record the number, and hence what your best knowledge is of the probability of the number being of each particular magnitude.

Steve Jessop
+1  A: 

Since you do not specify any probability distribution of the numbers (as others have correctly mentioned, there is no uniform distribution over all the positive integers), the No Free Lunch Theorem give the answer: any method (that does not repeat the same number twice) is as good as any other.

Once you start making assumptions about the distribution (f.x. it is a human being or binary computer etc. that chooses the number) this of course changes, but as the problem is stated any algorithm is as good as any other when averaged over all possible distributions.

Rasmus Faber