views:

299

answers:

7

Let's say you have some set of numbers with a known lower bound and unknown upper bound, i.e. 0, 1, 2, 3, ... 78 where 78 is the unknown. Assume for the moment there are no gaps in between numbers. There is a time-expensive function test() that tests if a number is in the set.

What is an efficient way (requiring a low amount of test() calls) to find the highest number in the set?

What if you have the added knowledge that the upper bound is 75 +/- 25?

What if there are random gaps between numbers in the set, i.e. 0, 1, 3, 4, 7, ... 78?

A: 

Maybe you should just traverse through it? It would be O(n) complex. I think there is no other way to do this.

Migol
A: 

Do you know the set size, before hand?

Actually, I guess you probably don't - otherwise the first problem would be trivial.

It would help if you had some idea how big the set was though.

  1. Take a guess at the top value
  2. Test - if in then increment value by some amount
  3. If not in then decrease value by some amount
  4. Once you have upper and lower bounds for largest value, binary search till you find it (to required precision).

For the gaps you've no such ability - you can't even tell when you've found the largest element. (Unless you known the maximum gap size)

Douglas Leeder
With some margin of error, yes.
Stewart
How is this any better than a one-shot traverse on the set?
Yuval A
Yuval: because traversing the set isn't offered as an option. A linear search would require traversing the entire (size unknown) domain.
James Curran
+1  A: 

Your best bet is to simply run through the set with O(n) complexity, which is not bad.

Take into consideration that the set is not sorted (it is a set, after all, and this is the given), each isInSet(n) operation takes O(n) as well, bringing you to O(n^2) for the entire operation, if you choose any algorithm for prodding the set at certain places...

A much better solution, if the set is in your control, would be to simply keep a max value of the set and update it on each insertion to the set. This will be O(1) for all cases.

Yuval A
What if the set is sorted?
Stewart
Then it is not a set anymore... and then you have immediate access to the max value.
Yuval A
+3  A: 

For the "no gaps case":

  • I assume that this is a fixed size of number, e.g. a 32 bit int
  • We wish to find x such that test(x) == true, test(x+1) == false, right?

You basically do a binary chop between the lowest known "not in set" (e.g. the biggest 32 bit int) and the highest known "in set" (starting with the known lower bound) by testing the middle value in the range each time and adjusting the boundaries accordingly. This would give an O(log N) solution (in terms of numbers of calls to test()) where X is the size of the potential set, not the actual set. This will be slower than just trying 1, 2, 3... for small sets, but much faster for large ones.

All of this falls down if there can be gaps, at which point I don't think there's any feasible solution beyond "start with the absolute highest possible number and work down until test(x) == true at which point that's the highest number". Any other strategy will fail or be more expensive as far as I can see.

Jon Skeet
Jon, take into consideration that you must demand BOTH test(x)==true AND test(y)==false FOR EACH y>x
Yuval A
Sorry, I missed the disclaimer at the beginning :)
Yuval A
@Yuval, JON SKEET IS NEVER WRONG!
Triptych
A: 

If there are no gaps, then you are probably best off with a binary search.

If we use the second assumption, that the top is 75 +/- 25, then are Low end is 50 and high end is 100, and our first test case is 75. If it is present, then the low end is 75 and the high end is 100, and our test case is 87. That should yield results in O( ln N) (where here N would be 50).

If we can't assume a possible upper range, we just have to made educated guess at what it might be. If a value is not found, it becomes the high end. If it is found, it's the low end, and we double it to find the high end.

If there are gaps, the only way I can see of doing it is a linear search -- but even then you'll need a way of knowing when you reached the end, rather that just a big gap.

James Curran
+1  A: 
  1. Set Step to 1
  2. set Upper to Lower + Step
  3. if test(Upper) is true then set Lower to Upper, multiply Step by 2 and go to point 2
  4. at this point you know that Lower is in your set while Upper is not. You can now do a binary search between Lower and Upper to find the limit.

This looks like O(log n * O(test)) complexity.

If you know that Upper is between 50 and 100, Do a binary search between these two values.

If you have random gaps and you know that the upper bound is 100 maximum I suspect you can not do better than starting from there and testing every number one by one until test() finds a value in your set.

If you have random gaps and you do not know an upper limit then you can never be sure you found the upper bound.

kmkaplan
Try O(nlogn). Each search takes O(n).
Yuval A
Yuval: O(2 *logN). The two searches kmkaplan describes are each O(log N), and are sequential, not nested.
James Curran
This is actually the best answer here. You don't need to guess an upper bound.
Svante
A: 

If your set happens to be the set of prime numbers, let me know when you find the biggest one. I'm sure we can work something out. ;)

But seriously, I'm guessing you know for a fact that the set does indeed have a largest value. Or, you're chopping it to a 32-bit integer.

A couple of suggestions:

1) Think of every case you can that would speed a result of test(x) == false. Then you can go on to the next one. If the time you spend going through all of the ejection cases is far less than going through the full test, then you'll come out ahead. 2) Can you gain any information from each test? For example, does test(x) == false imply that test(x+5679) == false as well?

John at CashCommons