views:

237

answers:

5

The problem is to extend the binary search algorithm to find all occurrences of a target value in a sorted array in the most efficient way. Concretely speaking, the input of the algorithm are (1) a sorted array of integers, where some numbers may appear more than once, and (2) a target integer to be searched. The output of the algorithm should be a pair of index values, indicating the first and last occurrence of the integer in the array, if it does occur. The source code could be in c#, c, c++.

Also what is the max and min number of comparisons that we might need to find the indexes.

+5  A: 

For C++, you could look up std::equal_range() and its complexity requirements. As long as you're interested in the basic algorithm, the same general rules should apply regardless of the language use for the implementation.

Jerry Coffin
+1: Everything that needs to be said, and all that should be said.
Potatoswatter
Specifically, looking at the implementation of lower_bound and upper_bound would give great insight into how to do this correctly.
MSN
A: 

I imagine that the normal algorithm would have something like this in it: if(value == test) return; if(value < test) min = i; if(value > test) max = i;

once you have used this to find one of the values. perform two more slightly moded binary searches using the min and max you currently have to find the tips.

to find the top most replace the above with:

if(value <= test) min = i; if(value > test) max = i;

for the bottom most replace with:

if(value >= test) max = i; if(value < test) min = i;

note there is no early return using this method, you just keep going until min and max are like one or something apart, I suppose you could add one with another check if( value == test and arr[i-1] != test) return; ect.

matt
thanks for the suggestion.....I am using the following approach:BinarySearch(A[0..N-1], value, low, high) { if (high < low) return -1 // not found mid = low + ((high - low) / 2) if (A[mid] > value) return BinarySearch(A, value, low, mid-1) else if (A[mid] < value) return BinarySearch(A, value, mid+1, high) else return mid // found.Now lets say the index of key value is found using this approach, then how do we find the index of first and last value of the same index??
iecut
thanks for all your help
iecut
A: 

If you are a little clever you can define two different binary search functions. One will return the index of the first appearance of the searched for value and the other will return the last appearance of the searched for value. From your knowledge of binary search, you should be able to determine the maximum and minimum number of comparisons.

Using two binary searches should be the fastest method on average in my opinion. For instance, if you use just one binary search to find the first item and search linearly afterwards the worst case would be if the entire function is the same value. For an array of length 10000, this would give 10013 comparisons in the worst case while using two binary searches would give 28 comparisons in the worst case for the same array. Of course, using the same size of array, the best case for the binary/linear search method would be 14 comparisons while the best case for two binary searches method is 26 comparisons.

** Update

Okay, here is a binary search to find the first appearance of an element in an array. I'll give you a recursive function (you can of course make it iterative and optimize this in other ways). This searches for the int val in the array a of ints. Also, I haven't been careful about finding the midpoint (if the array is really large there could be problems).

int bs1(int a[], int val, int left, int right)
{
    if(right == left) return left;
    int mid = (right+left)/2;

    if(val > a[mid]) return bs1(a, val, mid+1, right);
    else return bs1(a, val, left, mid);
}

However, you should check after you are returned an index that it actually refers to the correct value because if val is not in the array, the returned index will to correspond to the next element larger than val.

A few minor changes to this will make a function that finds the last element. The keys to doing this are using the comparators correctly and remembering that integer division always truncates.

Justin Peel
Can you please elaborate how to use two different binary functions in finding the first and last appearance, I mean how do we compute that whether it is the first or last value.
iecut
I updated my answer to give you one of the binary search function. Now you find one that finds the last appearance of the value in the array.
Justin Peel
thank a lot for all your help!!I am assuming that the max no. of comparisons if I take 2 binary search for finding the min and max index value are 2 times log 2 to the base n and the best case is log 2 to the base n......correct me if I am wrong in either of the case.where n is no. of elements.
iecut
umm.. that's not quite right. The best case is 2*floor(log base 2 of N) and the worst case is 2*ceil(log base 2 of N). The floor function rounds down; the ceil function rounds up. Take for instance an array of length 3. If I use the binary search function above, and the if(val > a[mid]) is true, then we have found the index with only 1 comparison, but if it isn't true, then it will take 2 comparisons. This corresponds with the worst and best cases above (though without the 2 in front because we are only using one binary search).
Justin Peel
A: 

This is fairly easy to do without writing your own binary search algorithm, by repeatedly calling a standard algorithm.

// some curly-bracket language:

// int BinarySearch(sortedList, searchIndex, searchLength, valueToFind)
// returns the zero-based index of the item in the list, or a negative value
// if the item is not found

int inner = BinarySearch(list, 0, listSize, value);
if(inner < 0){
    // handle case where value is not found in list
}

int bottom = inner, top = inner;
while(true){
    int i = BinarySearch(list, 0, bottom, value);
    if(i < 0)
        break;
    bottom = i;
}
while(true){
    int i = BinarySearch(list, top + 1, listSize - top - 1, value);
    if(i < 0)
        break;
    top = i;
}

// bottom and top now hold the bounds of all instances of value in list

This is pretty close to the same efficiency you'd get with a custom algorithm, except that you have more function call overhead.

As for the number of comparisons, I'd have to think a little harder to be sure, but I think it's just 2*log2N, where N is the number of items in the list.


Edit

Bah! It's not 2*log2N, because unlike what you could do with a custom algorithm, it doesn't incrementally exclude portions of the list. It appears1 that the maximum comparison count is (log2N - 0.5) * log2N. This is still only 885 comparisons for a list with 230 elements (390 comparisons for 220 N, and 95 for 210 N), but we can do better than that.

// int Compare(a, b)
// returns 0 if a and b are equal,
//         a negative value if a < b, or
//         a positive value if a > b

int start = 0, end = listSize, inner;

while(true){
    if(end == start){
        // handle case where value is not found in list
    }
    inner = (start + end) / 2;
    int cmp = Compare(list[inner], value);
    if(cmp == 0)
        break;
    if(cmp < 0)
        start = inner + 1;
    else end = inner;
}

int top = inner, bottom = inner;

while(true){
    if(start >= bottom)
        break;
    inner = (start + bottom) / 2;
    int cmp = Compare(list[inner], value);
    if(cmp == 0)
        bottom = inner;
    else start = inner + 1;
}

while(true){
    if(end - 1 <= top)
        break;
    inner = (top + 1 + end) / 2;
    int cmp = Compare(list[inner], value);
    if(cmp == 0)
        top = inner;
    else end = inner;
}

This will do at most 2*log2N comparisons. 230 items will require at most 60 comparisons, 220 items will require at most 40 comparisons, etc.


1 I determined this experimentally. I'm not quite smart enough to figure it out mathematically.

P Daddy
thanks for your help!!
iecut
A: 

There's no clean answer to the most efficient part of the question. That would depend on how many entries with the same value is to be expected. If it's a few the a linear search in both directtions of the array after finding one element will be you're fastest option but if you're expecting a lot of entries with the same value you could do kind of a binary search to find the start end indices.

Disclaimer: Not tested it ment to show the idea not to be used directly as production code int org = binarySearch(array,value) //do the binary search and find on element int min = org-delta; //delta is some constant based on how many elemts are to be expected int max = org; min = min < 0 ? 0 : min; int search= min; bool latestWasHit = false; while(search > 0) { if(search+1 == max) return max; if(array[search] != value) { min = search; search = search + (max-search)/2 } else { max = search; search = (search-min)/2; } }

and then the reverse for the upper bound. However it will require quite a lot of elements before this is faster than a simple linear search.

Rune FS
thanks for the help!!
iecut