we have two databases of size n containing numbers without repeats. So, in total we have 2n elements. They can be accessed through a query to one database at a time. The query is such that you give it a k and it returns kth smallest entry in that database. we need to find nth smallest entry among all the 2n elements in O(logn) queries. the idea is to use divide and conquer but i need some help thinking through this. thanks!
views:
350answers:
3I saw this problem on a qualifying exam not long ago, and it smells like a homework problem. I will therefore make only two suggestions:
Study binary search, with careful attention to the precise nature of the loop invariants. Jon Bentley's book Programming Pearls has a good explanation
Try generalizing the invariants for binary search.
Draw a picture with various special cases:
- Every number in DB1 is smaller than every number in DB2
- Vice versa
- DB1 equals DB2
- Every number in DB2 is twice the corresponding number in DB1
This is a fairly hard problem; you'll have an easier time if you go straight for a proof of correctness.
Tips:
- Observe your "databases" are really two sorted arrays.
- Take elements from "middle" of the arrays and compare them. The result of the comparision might allow you to "rule out" some part.
Here's how I thought this through. Since it's an educational problem, I suggest you stop reading if any of the points makes you think, "aha, I hadn't thought of that, I can think further along those lines by myself".
1) Same observation as sdcwc: with a possible slight quibble over whether it starts at 0 or 1, the database can be thought of as a sorted array. I ask for element 0, I get the smallest. I ask for 12, I get the 13th smallest. And so on. I find this easier to visualise.
2) We know we're looking for an O(log n) algorithm. That means that roughly speaking we're looking for one of two things:
Either we start out by calculating the smallest element, then calculate the 2nd smallest, 4th smallest, 8th, etc, until we're up to the size we want, each step in constant time. This doesn't sound promising to me.
Or we start out with a problem of size n then we perform some constant-time operation which allows us to solve the original problem by solving a problem of size n/2. Obviously we can solve the problem for n=1 in constant time, to complete the algorithm. This sounds a bit more plausible.
Actually it doesn't necessarily have to be n/2 each time. It could be n/3, or 999*n/1000, and the result will still be O(log n). But there's no harm in looking for n/2 first.
3) How are we going to reduce the problem like that? Well, if we can discount/remove m elements from the start of one array or the other as being smaller than the kth smallest, then we can find the (k-m)th smallest element of the resulting pair of arrays, and it will be the kth smallest of the original arrays.
4) Finally, the breakthrough observation is that if the mth smallest element of array A is smaller than the mth smallest element of array B, then that mth element of A cannot possibly be the (2m)th smallest element of the two arrays combined. It's smaller than that (or of equal value: I'm not sure whether "no repeats" means "no repeats in each database", or "no repeats between the databases combined"), since there are at most 2*(m-1) elements strictly smaller than it in both arrays combined.
Unless I've made an error, the rest is coding. With a small extra argument to account for the off-by-1 when k is odd, this solution is actually O(log k), which is O(log n) since k <= 2*n.