Here's a simple O(N)
solution that uses O(N)
space. I'm assuming that we are restricting the input list to non-negative numbers and that we want to find the first non-negative number that is not in the list.
- Find the length of the list; lets say it is
N
.
- Allocate an array of
N
booleans, initialized to all false
.
- For each number
X
in the list, if X
is less than N
, set the X'th
element of the array to true
.
- Scan the array starting from index
0
, looking for the first element that is false
. If you find the first false
at index I
, then I
is the answer. Otherwise (i.e. when all elements are true
) the answer is N
.
In practice, the "array of N
booleans" would probably be encoded as a "bitmap" or "bitset" represented as a byte
or int
array. This typically uses less space (depending on the programming language) and allows the scan for the first false
to be done more quickly.
This is how / why the algorithm works.
Suppose that the N
numbers in the list are not distinct, or that one or more of them is greater than N
. This means that there must be at least one number in the range 0 .. N - 1
that is not in the list. So the problem of find the smallest missing number must therefore reduce to the problem of finding the smallest missing number less than N
. This means that we don't need to keep track of numbers that are greater or equal to N
... because they won't be the answer.
The alternative to the previous paragraph is that the list is a permutation of the numbers from 0 .. N - 1
. In this case, step 3 sets all elements of the array to true
, and step 4 tells us that the first "missing" number is N
.
The computational complexity of the algorithm is O(N)
with a relatively small constant of proportionality. It makes two linear passes through the list, or just one pass if the list length is known to start with. There is no need to represent the hold the entire list in memory, so the algorithm's asymptotic memory usage is just what is needed to represent the array of booleans; i.e. O(N)
bits.
(By contrast, algorithms that rely on in-memory sorting or partitioning assume that you can represent the entire list in memory. In the form the question was asked, this would require O(N)
64-bit words.)
@Jorn comments that steps 1 through 3 are a variation on counting sort. In a sense he is right, but the differences are significant:
- A counting sort requires an array of (at least)
Xmin - Xmin
counters where Xmax
is the largest number in the list and Xmin
is the smallest number in the list. Each counter has to be able to represent N states; i.e. assuming a binary representation it has to have an integer type (at least) ceiling(log2(N))
bits.
- To determine the array size, a counting sort needs to make an initial pass through the list to determine
Xmax
and Xmin
.
- The minimum worst-case space requirement is therefore
ceiling(log2(N)) * (Xmax - Xmin)
bits.
By contrast, the algorithm presented above simply requires N
bits in the worst and best cases.
However, this analysis leads to the intuition that if the algorithm made an initial pass through the list looking for a zero (and counting the list elements if required), it would give a quicker answer using no space at all if it found the zero. It is definitely worth doing this if there is a high probability of finding at least one zero in the list. And this extra pass doesn't change the overall complexity.
EDIT: I've changed the description of the algorithm to use "array of booleans" since people apparently found my original description using bits and bitmaps to be confusing.