I'm studying for upcoming interviews and have encountered this question several times (written verbatim)
Find or determine non existence of a number in a sorted list of N numbers where the numbers range over M, M >> N and N large enough to span multiple disks. Algorithm to beat O(log n); bonus points for constant time algorithm.
First of all, I'm not sure if this is a question with a real solution. My colleagues and I have mused over this problem for weeks and it seems ill formed (of course, just because we can't think of a solution doesn't mean there isn't one). A few questions I would have asked the interviewer are:
- Are there repeats in the sorted list?
- What's the relationship to the number of disks and N?
One approach I considered was to binary search the min/max of each disk to determine the disk that should hold that number, if it exists, then binary search on the disk itself. Of course this is only an order of magnitude speedup if the number of disks is large and you also have a sorted list of disks. I think this would yield some sort of O(log log n) time.
As for the M >> N hint, perhaps if you know how many numbers are on a disk and what the range is, you could use the pigeonhole principle to rule out some cases some of the time, but I can't figure out an order of magnitude improvement.
Also, "bonus points for constant time algorithm" makes me a bit suspicious.
Any thoughts, solutions, or relevant history of this problem?