views:

115

answers:

6
+1  Q: 

Binary Search Help

Hi, for a project I need to implement a binary search. This binary search allows duplicates. I have to get all the index values that match my target. I've thought about doing it this way if a duplicate is found to be in the middle:

Target = G Say there is this following sorted array:

B, D, E, F, G, G, G, G, G, G, Q, R S, S, Z

I get the mid which is 7. Since there are target matches on both sides, and I need all the target matches, I thought a good way to get all would be to check mid + 1 if it is the same value. If it is, keep moving mid to the right until it isn't. So, it would turn out like this:

B, D, E, F, G, G, G, G, G, G (MID), Q, R S, S, Z

Then I would count from 0 to mid to count up the target matches and store their indexes into an array and return it.

That was how I was thinking of doing it if the mid was a match and the duplicate happened to be in the mid the first time and on both sides of the array.


Now, what if it isn't a match the first time? For example:

B, D, E, F, G, G, J, K, L, O, Q, R, S, S, Z

Then as normal, it would grab the mid, then call binary search from first to mid-1.

B, D, E, F, G, G, J

Since G is greater than F, call binary search from mid+1 to last.

G, G, J.

The mid is a match. Since it is a match, search from mid+1 to last through a for loop and count up the number of matches and store the match indexes into an array and return.

Is this a good way for the binary search to grab all duplicates? Please let me know if you see problems in my algorithm and hints/suggestions if any. The only problem I see is that if all the matches were my target, I would basically be searching the whole array but then again, if that were the case I still would need to get all the duplicates.

Thank you

BTW, my instructor said we cannot use Vectors, Hash or anything else. He wants us to stay on the array level and get used to using them and manipulating them.

+2  A: 

Why do you want to get away from what makes Binary search so great as soon as you find a match? Why not use a binary search on the upper half and narrow it down until you don't get any more G's, then do the same with the lower. That way in the worst case you're not searching the whole array. You find the min and max index in this way, then store those plus all of the intermediate ones in an array.

MBennett
If I understand what you're saying, your answer would work in a case like this:B D G G G G G G G G G G G S ZLower half:B D G G G G G -> B D G where mid + 1 would be the min index.Upper half:G G G G G S Z -> G S Z where mid - 1 would be the max index.Is this right? If it is though, I'm not sure it would work in this case:B D E F G G G G G H Q R S S ZLower half:B D E F G G G -> G G G where Mid - 1 would have to be the min index.Upper half:G H Q R S S Z -> G H Q where mid - 2 would have to be the upper index.Am I understanding what you said, correctly?
aloh
No. Using your example to clarify:B D G G G G G G G G G G G S Zmid = G (index 7), now check the top half G G G G G S Z.mid = G (index 11), now check the top half G S Z.mid = S, not G, so check the bottom half Gmid = G (index 12), and since there is no more to check, index 12 gives us our maximum index.Now we repeat this process on the lower half and get that the minimum index is 2. Then we know that the indices between 2 and 12 inclusive are all G.
MBennett
Thank you. I understand now with the example you used. However, what about with the second example?B D E F G G G G G H Q R S S Z. Upper half: G H Q R S S Z mid = R, -> check G H Q mid = H, -> G so max index would be 8. However, for the bottom half: B D E F G G G mid = F, -> check G G G mid = G. In this case the mid is a match but it is not the min index. Or, maybe I am doing something wrong... :(
ShrimpCrackers
You have the right idea, but you keep iterating the process (or using recursion if you prefer) until you have the final index. This is why in the example above the top half is checked multiple times. For the example you just gave the process for the bottom half is as follows:B D E F G G G, mid = F (index 3), check G G G, mid = G. Now the mid is a match, but we don't just want a match, we want the first G. So we check the bottom half, and get G again (index 4), and since we have no top or bottom half to check we get that the minimum index is 4.
MBennett
Bennett, I think I understand completely what you mean now. I'm going to go try and put this in code. Thank you so much.Darn, I accidentally posted this as an unregistered user (I'm logged in now) and I'd like to accept it as the answer. Is there anyway to get it accepted as the best answer?
ShrimpCrackers
A: 

Check out this past question. The accepted answer uses the STL, but some of the other answers have some info that might help you.

Justin Peel
+2  A: 

Ref to the source code of lower_bound function in stl.

If you have a copy of Programming Pearls at hand or in the school library, refer to the 4th chapter and its solutions at the end of the book.

Yin Zhu
Since this is homework and his instructor wants them to get comfortable manipulating arrays, I would assume that STL is out of play.
MBennett
stl has source code.
Yin Zhu
I think the answerer meant "look at how the function is implemented".
jkff
My mistake. This is good advice then.
MBennett
A: 

Notice that you're not searching for an element X, but you're searching for a boundary Y,X where Y < X and symmetrically for X,Z where X < Z. These boundaries may also be found with binary search in an imaginary array consisting of inter-element boundaries.

jkff
A: 

test post test post test post trying to log back in to accept answer

aloh
A: 

I answered this question with a solution that uses a modified binary search algorithm for arrays. As this is homework, don't click the link unless you want it spoiled, but the gist is that by tinkering with the conditions in a binary search loop, you can get it to do any of the following 3 behaviors:

  1. return a match the moment it finds one (normal behavior, match can be anywhere in a run of identical values)
  2. return the left-most match
  3. return the right-most match

Then your question is answered by running this binary search twice, once to find the left-most match and then again to find the right-most match (only if the first run succeeded). The difference between the two results, i.e. match indices, is 1 less than the number of total matches found.

Phil