So far you received multiple advices most of which state that linear search makes no sense on sorted data, when binary search will work much more efficiently instead. This often happens to be one of those popular "sounds right" assertions made by people who don't care to give the problem too much thought. In reality, if you consider the bigger picture, given the right circumstances, linear search can be much more efficient than binary search.
Note, that if we consider a single search query for a sorted array, binary search is significantly more efficient method than linear search. There's no argument about that. Also, when you perform multiple completely random queries to the same data binary search still wins over linear search.
However, the picture begins to change if the queries are not exactly random. Imagine that queries arrive in sorted order, i.e. each next query is for a higher value than the previous query. I.e. the queries are also sorted. BTW, they don't have to be globally and strictly sorted, from time to time the query sequence might get "reset", i.e. a low value is queried, but on average the consequent queries should arrive in increasing order. In other words, the queries arrive in series, each series sorted in ascending order. In this case, if the average length of the series is comparable to the length of your array, linear search will outperform binary search by a huge margin. However, to take advantage of this situation, you have to implement your search in incremental manner. It is simple: if the next query is greater than the previous one, you don't need to start the search from the beginning of the array. Instead, you can search from the point where the previous search stopped. The most simplistic implementation (just to illustrate the idea) might look as follows
static int linear(const int *arr, int n, int key)
{
static int previous_key = INT_MIN;
static int previous_i = 0;
i = key >= previous_key ? previous_i : 0;
while (i < n) {
if (arr[i] >= key)
break;
++i;
}
previous_key = key;
previous_i = i;
return i;
}
(Disclaimer: the above implementation is terribly ugly for the obvious reason that the array is arriving from outside as a parameter, while the previous search state is stored internally. Of course, this is wrong way to do it in practice. But again, the above is intended to illustrate the idea and no more).
Note, that the complexity of processing each series of ordered queries using the above approach is always O(N)
, regardless of the length of the series. Using the binary search, the complexity would be O(M * log N)
. So, for obvious reasons when M
is close to N
, i.e. queries arrive in sufficiently long ordered series, the above linear search will significantly outperform binary search, while for small M
the binary search will win.
Also, even if the ordered series of queries are not very long, the above modification might still give you a noticeable improvement in search performance, considering that you have to use linear search.
P.S. As an additional piece of information about the structure of the problem:
When you need to perform the search in an ordered array of length N
and you know in advance that the queries will arrive in ordered series of [approximate, average] length M
, the optimal algorithm will look as follows
- Calculate the stride value
S = [N/M]
. It might also make sense to "snap" the value of S
to the [nearest] power of 2. Think of your sorted array as a sequence of blocks of length S
- S-blocks.
- After receiving a query, perform incremental linear search for the S-block that potentially contains the queried value, i.e. it is an ordinary linear search with stride
S
(of course, remember to start from the block where the previous search left off).
- After finding the S-block, perform the binary search within the S-block for the queried value.
The above is the most optimal incremental search algorithm possible, in a sense that it achieves the theoretical limit on the asymptotic efficiency of repetitive search. Note, that if the value of M
is much smaller then N
, the algorithm "automatically" shifts itself towards binary search, while when M
gets close to N
the algorithm "automatically" favors linear search. The latter makes sense because in such environment linear search is significantly more efficient than binary search.
This all is just to illustrate the fact that blanket statements like "linear search on a sorted array is always useless" indicate nothing else than lack of knowledge on the part of those who make such statements.