tags:

views:

184

answers:

6

In Matters Computational I found this interesting linear search implementation (it's actually my Java implementation ;-)):

public static int linearSearch(int[] a, int key) {
    int high = a.length - 1;
    int tmp  = a[high];

    // put a sentinel at the end of the array   
    a[high] = key;

    int i = 0;

    while (a[i] != key) {
        i++;
    }

    // restore original value
    a[high] = tmp;

    if (i == high && key != tmp) {
        return NOT_CONTAINED;
    }

    return i;
}

It basically uses a sentinel, which is the searched for value, so that you always find the value and don't have to check for array boundaries. The last element is stored in a temp variable, and then the sentinel is placed at the last position. When the value is found (remember, it is always found due to the sentinel), the original element is restored and the index is checked if it represents the last index and is unequal to the searched for value. If that's the case, -1 (NOT_CONTAINED) is returned, otherwise the index.

While I found this implementation really clever, I wonder if it is actually useful. For small arrays, it seems to be always slower, and for large arrays it only seems to be faster when the value is not found. Any ideas?

EDIT

The original implementation was written in C++, so that could make a difference.

+2  A: 

A sentinel search goes back to Knuth. It value is that it reduces the number of tests in a loop from two ("does the key match? Am I at the end?") to just one.

Yes, its useful, in the sense that it should significantly reduce search times for modest size unordered arrays, by virtue of eliminating conditional branch mispredictions. This also reduces insertion times (code not shown by the OP) for such arrays, because you don't have to order the items.

If you have larger arrays of ordered items, a binary search will be faster, at the cost of larger insertion time to ensure the array is ordered.

For even larger sets, a hash table will be the fastest.

The real question is what is the distribution of sizes of your arrays?

Ira Baxter
+5  A: 

Doesn't seem particularly useful. The "innovation" here is just to get rid of the iteration test by combining it with the match test. Modern processors spend 0 time on iteration checks these days (all the computation and branching gets done in parallel with the match test code).

In any case, binary search kicks the ass of this code on large arrays, and is comparable on small arrays. Linear search is so 1960s.

Keith Randall
Linear search is better than binary search if the array isn't sorted and you only need to find a couple of values. Iteration check times are small, but not zero in my experience. For instance, by eliminating some iteration checks in a C++ quicksort implementation recently I was able to a get a couple percent improvement in time. Yes, binary search is of course the way to go though if your array is sorted or it is better to sort and use binary search if you need to search the array often.
Justin Peel
+8  A: 

It's not thread-safe, for example, you can lose your a[high] value through having a second thread start after the first has changed a[high] to key, so will record key to tmp, and finish after the first thread has restored a[high] to its original value. The second thread will restore a[high] to what it first saw, which was the first thread's key.

It's also not useful in java, since the JVM will include bounds checks on your array, so your while loop is checking that you're not going past the end of your array anyway.

Stephen Denne
+1 For mentioning thread safety, didn't think of that.
Helper Method
Actually with respect to bound checks this method will definitely be slower as here I don't see how JVM (not javac) could eliminate them, whereas I've observed that in tests with iteration over arrays like "int n=a.length; for(int i=0; i<n; ++i) process(a[i])" it does.
jkff
@jkff, I corrected the reference to javac. Interestingly though, I've seen differences in performance due to using Eclipse's compiler vs. Sun's, presumably due to producing different bytecodes that either did or didn't match what a JVM optimization was looking for.
Stephen Denne
+2  A: 

See also the 'finding a tiger in africa' joke.
Punchline = An experienced programmer places a tiger in cairo so that the search terminates.

Martin Beckett
+7  A: 

Will you ever notice any speed increase from this? No

Will you notice a lack of readability? Yes

Will you notice an unnecessary array mutation that might cause concurrency issues? Yes

Premature optimization is the root of all evil.

erikkallen
But this algorithm is given in Knuth as optimization, who said your last line:)
Fakrudeen
A: 

Yes - it does because while loop doesn't have 2 comparisons as opposed to standard search.

It is twice as fast.It is given as optimization in Knuth Vol 3.

Fakrudeen
This is true, but doesn't seem to hold for Java (which does the range check implicitely)
Helper Method