tags:

views:

514

answers:

8

I have a std::vector containing a handful of numbers, which are not in any particular order, and may or may not have gaps between the numbers - for example, I may have { 1,2,3, 6 } or { 2,8,4,6 } or { 1, 9, 5, 2 }, etc.

I'd like a simple way to look at this vector and say 'give me the lowest number >= 1 which does not appear in the vector'. So,

for the three examples above, the answers would be 4, 1 and 3 respectively.

It's not performance critical, and the list is short so there aren't any issues about copying the list and sorting it, for example.

I am not really stuck for a way to do this, but my STL skills are seriously atrophied and I can feel that I'm about to do something inelegant - I would be interested to see what other people came up with.

+3  A: 

Sorting the list and then doing a linear search seems the simplest solution. Depending on the expected composition of the lists you could use a less general purpose sorting algorithm, and if you implement the sort yourself you could keep track of data during the sort that could be used to speed up (or eliminate entirely) the search step. I do not think there is any particularly elegant solution to this problem

Sparr
+1, simple and works
orip
+2  A: 

Sort-n-search:

std::sort(vec.begin(), vec.end());
int lowest = 1;
for(size_t ii = 1; ii < vec.size(); ++ii)
{
    if (vec[ii - 1] + 1 < vec[ii])
    {
        lowest = (vec[ii - 1] + 1);
        break;
    }
}

/* 1, 2, ..., N case */
if (lowest == vec[0]) lowest = (*vec.back()) + 1;

Iterators could be used with just as clear intent as showcased in @joe_mucchiello's (ed: better) answer.

sixlettervariables
OK, thanks very much. I've just done something like this, but with iterators and handling the end case as well.
Will Dean
{1,2,3,4} should return 5 not 1. This code accounts for the edge condition at the beginning, but not at the end.
Greg Rogers
@Will and @Greg, fixed. Iterators are left as an exercise for the reader ;)
sixlettervariables
You should just be comparing against lowest. You always know what the next value should be. See my answer below.
jmucchiello
I saw, I'll let yours get voted up over mine.
sixlettervariables
Sorry 6lv - I've moved the golden checkmark...
Will Dean
@Will Dean: As it should good sir, as it should ;D
sixlettervariables
+3  A: 

You could allocate a bit vector (of the same length as the input vector), initialize it to zero, then mark all indices that occur (note that numbers larger than the length can be ignored). Then, return the first unmarked index (or the length if all indices are marked, which only happens if all indices occur exactly once in the input vector).

This should be asymptotically faster than sort and search. It will use more memory than sorting if you are allowed to destroy the original, but less memory than sorting if you must preserve the original.

comingstorm
+2  A: 

Actually, if you do a bubble sort (you know... the one that they teach you first and then tell you to never use again...), you will be able to spot the first gap early in the sorting process, so you can stop there. That should give you the fastest overall time.

James Curran
Good observation. I wonder if a k-th smallest algorithm could also be adapted, or an nlogn sort?
Mike Dunlavey
Nice! There actually is a use for it.
chris
Only if the first gap is a fairly low number. In the worst case, all the numbers from [1..size] occur, and your runtime will be O(size^2)
comingstorm
O() notation is by definition the worst case.
MSalters
+5  A: 

The checked answer uses < for comparison. != is much simpler:

int find_gap(std::vector<int> vec) {
    std::sort(vec.begin(), vec.end());
    int next = 1;
    for (std::vector<int>::iterator it = vec.begin(); it != vec.end(); ++it) {
        if (*it != next) return next;
       ++next;
    }
    return next;
}

find_gap(1,2,4,5) = 3
find_gap(2) = 1
find_gap(1,2,3) = 4

I'm not passing a reference to the vector since a) he said time doesn't matter and b) so I don't change the order of the original vector.

jmucchiello
Nicer than mine, well played good sir.
sixlettervariables
A: 

OK, here's my 2 cents. Assume you've got a vector of length N.

  1. If N<=2 you can check directly
  2. First, use min_element to get the smallest element, remember it as emin
  3. Call nth_element to get the element at N/2, call it ehalf
  4. If ehalf != emin+N/2 there's a gap to the left, apply this method recursively there by calling nth_element on the whole array but asking for element N/4. Otherwise, recurse on the right asking for element 3*N/4.

This should be slightly better than sorting completely up front.

Thomas Kammeyer
Your assumption that emin+N/2 is going to be at index N/2 presumes that the array is already sorted.
Sparr
Um, no. I didn't presume that. I called nth_element to find just which element WOULD be there in sorted order. nth_element uses the median-find algorithm. There *is* and optimization possible: you can recurse on half the array (nth_element partitions).
Thomas Kammeyer
And by the way: this is *faster* than sort-n-search on average if you do optimize by only recursively processing the necessary half of the array.
Thomas Kammeyer
+1  A: 

you could go with something like....

struct InSequence
{
    int _current; bool insequence;
    InSequence() : _current(1), insequence(true){}
    bool operator()(int x) {    
     insequence = insequence ? (x == _current) : false;  
     _current++; 
     return insequence;
    }
};

int first_not_in_sequence(std::vector<int>& v)
{
    std::sort(v.begin(), v.end());
    return 1+std::count_if(v.begin(), v.end(),InSequence());
}
Keith Nicholas
You `insequence` test in op() is overly complicated. You can write it much simpler and clearer: `insequence = insequence
Konrad Rudolph
+1 for showing Kristo the STLish way to do this using a functor with state... Though you shouldn't use a lead underscore to mark local variables as those names are reserved for the compiler, int current_; has the same impact and it's guaranteed not to have conflicts.
ceretullis
+4  A: 

The STL algorithm you are looking for is adjacent_find.

Here is a solution that also uses boost.lambda to make the predicate clean:

int first_gap( std::vector<int> vec )
{
    using namespace boost::lambda;

    // Handle the special case of an empty vector.  Return 1.
    if( vec.empty() )
        return 1;

    // Sort the vector
    std::sort( vec.begin(), vec.end() );

    // Find the first adjacent pair that differ by more than 1.
    // The last parameter is a boost.lambda expression
    std::vector::iterator i = std::adjacent_find( vec.begin(), vec.end(), _1+1 < _2 );

    // Handle the special case of no gaps.  Return the highest value + 1.
    if ( i == vec.end() )
        --i;

    return 1 + *i;
}
Shmoopty
In many ways that's exactly what I asked for. I'm going to leave a slightly less STL-like version selected because I'm on a bit of a downer about endless predicate functions at the moment. Roll-on C++ lambdas.
Will Dean
I think this should be the accepted answer...not that its complete
Keith Nicholas
The question has a well-defined answer for empty vectors. In an empty vector, the smallest integer >= 1 that does not appear in the vector is 1. In STL terms: we're looking for the smallest N>=1 such that find(begin,end,N)==end.
MSalters
You are correct, MSalters. Code adjusted.
Shmoopty