views:

328

answers:

5

I wrote this for self education. How does it look? (It passes all my junit tests.)

/**
 * find needle in haystack 
 * @param haystack a sorted list 
 * @param needle
 * @return i such that <code>haystack[i] == needle</code>, or -1 if it is not found
 */
public static int find(char[] haystack, char needle) {
 int minWindow = 0; // lower bound of the search
 int maxWindow = haystack.length - 1; // upper bound of the search
 int midpoint = (maxWindow + minWindow) / 2;

 while (maxWindow - minWindow > 1) {
  if (haystack[midpoint] == needle) {
   return midpoint;
  }
  if (haystack[maxWindow] == needle) {
   return maxWindow;
  }
  if (haystack[minWindow] == needle) {
   return minWindow;
  }
  if (haystack[midpoint] < needle) {
   minWindow = midpoint;
  }
  else if (haystack[midpoint] > needle) {
   maxWindow = midpoint;
  }
  midpoint = (maxWindow + minWindow) / 2;
 }

 return -1;
}
+4  A: 

What if the array has only one element, the one you're looking for?

Moishe
It will also fail if the array has two elements. `minWindow = 0`, `maxWindow = 1`, and the `while` test `maxWindow - minWindow > 1` will fail and it will return -1.
jasonh
+1  A: 

It is a lot slower than it needs to be (in the typical case) because you are doing 3 extra test which will fail except for the last time through. You are also missing the case where there are only one or two entries.

Personally I would move the following after the loop:

        if (haystack[maxWindow] == needle) {
                return maxWindow;
        }
        if (haystack[minWindow] == needle) {
                return minWindow;
        }

And remove the following totally:

    if (haystack[midpoint] == needle) {
            return midpoint;
    }
Dipstick
+2  A: 

@Rosarch - given the responses, there are few things to learn:

  1. Cute but nonsensical names for variables are not a good idea, especially in interfaces.

  2. Your method javadoc should clearly describe the purpose of a method and any preconditions on its inputs. "Find needle in haystack" is clearly bogus, and the input is an array of characters not a list. (The latter is an important distinction since most Java developers equate "list" with List, and a "sorted" also has different connotations in that context.)

  3. Your unit tests clearly didn't cover the boundary cases where haystack has 0 or 1 entries. You should always think hard about ... and unit test the boundary cases, because they have a significant chance of different behaviour/errors to the normal cases.

  4. Unit tests are no substitute for carefully examining your own code. (Code review by someone else is also good, though you cannot rely on having someone else to do a code review for you).

  5. Binary search implementations are available in JDK and third party libraries, and the chances are that they are as good as (and probably better than) something you can afford to implement yourself. [I acknowledge that you have a sound reason to implement yourself the code in this case.]

Stephen C
+4  A: 

Your implementation suffers from the same problem outlined here:

Nearly All Binary Searches and Mergesorts are Broken

midpoint = (maxWindow + minWindow) / 2;

The problem is that if the array is sufficiently large, the addition will overflow. There's no great shame in making that mistake, as the article points out pretty much everybody does. This bug went undetected for 9 years in the standard Java implementation.

Dan Dyer
Beat me to it :)
Edan Maor
+1  A: 

As others have already pointed out, your implementation has a couple of bugs that need to be fixed. I'll add that, unless you have a good reason for it, it's better to return -index-1 when the item is not in the list, where index is the index at which it should be inserted to keep the list remain sorted. In fact, this is how binary search is implemented in the JDK:

public static int binarySearch(char[] a, char key)

Searches the specified array of chars for the specified value using the binary search algorithm. The array must be sorted (as by the sort method, above) prior to making this call. If it is not sorted, the results are undefined. If the array contains multiple elements with the specified value, there is no guarantee which one will be found.

Parameters:

a - the array to be searched. key - the value to be searched for.

Returns:

index of the search key, if it is contained in the list; otherwise, (-(insertion point) - 1). The insertion point is defined as the point at which the key would be inserted into the list: the index of the first element greater than the key, or list.size(), if all elements in the list are less than the specified key. Note that this guarantees that the return value will be >= 0 if and only if the key is found.

Bytecode Ninja