views:

812

answers:

11

How do I find the above without removing the largest element and searching again? Is there a more efficient way to do this? It does not matter if the these elements are duplicates.

A: 

Create a sublist from n..m, sort it descending. Then grab the first two elements. Delete these elements from the orginal list.

Byron Whitlock
+17  A: 

using partial_sort ?

std::partial_sort(aTest.begin(), aTest.begin() + 2, aTest.end(), Functor);

An Example:

std::vector<int> aTest;

    aTest.push_back(3);
    aTest.push_back(2);
    aTest.push_back(4);
    aTest.push_back(1);


    std::partial_sort(aTest.begin(), aTest.begin()+2,aTest.end(), std::greater<int>());

    int Max = aTest[0];
int SecMax = aTest[1];
aJ
+1 I didn't know about partial_Sort. BUT doesn't work if he wants to keep the order of the original list.
Byron Whitlock
You can use partial_sort_copy for that
aJ
won't this get you the smallest elements? It's easy enough to fix by using a different comparison operator. +1
rmeador
Yes, I was writing the same. Thanks
aJ
And if you're streaming a large number of values through, say from an input_iterator and don't want to store the entire range, only the top elements, then the heap operations -- make_heap, push_heap and sort_heap -- are the way to go.
Boojum
What's the difference from just `std::sort(aTest.begin(), aTest.end());`? aTest[0] - largest, aTest[1] - second largest.
Kirill V. Lyadvinsky
partial_sort has the special ability to stop sorting when only the first n elements need to be sorted. (Second argument)
aJ
+1 It could be faster that just `std::sort`.
Kirill V. Lyadvinsky
Any idea about the time complexity?
Jacob
From Josuttis --- Partial_sort() guarantees n * log(n) complexity in any case. Normally between linear and n-log-n (approximately numberOfElements*log(numberOfSortedElements) comparisons).
aJ
+2  A: 

Lets assume you mean to find the two largest unique values in the list.

If the list is already sorted, then just look at the second last element (or rather, iterate from the end looking for the second last value).

If the list is unsorted, then don't bother to sort it. Sorting is at best O(n lg n). Simple linear iteration is O(n), so just loop over the elements keeping track:

v::value_type second_best = 0, best = 0;
for(v::const_iterator i=v.begin(); i!=v.end(); ++i)
   if(*i > best) {
     second_best = best;
     best = *i;
   } else if(*i > second_best) {
     second_best = *i;
   }

There are of course other criteria, and these could all be put into the test inside the loop. However, should you mean that two elements that both have the same largest value should be found, you have to consider what happens should three or more elements all have this largest value, or if two or more elements have the second largest.

Will
Except that this does not handle duplicates in the list
ezpz
Need an else block if second_best < *i < best
Douglas Leeder
I don't need to find unique values, duplicates are fine - I've updated this in the question.
Jacob
+15  A: 
for (e: all elements) {
 if (e > largest) {
   second = largest;
   largest = e;
 } else if (e > second) {
   second = e;
 }
}

You could either initialize largest and second to an appropriate lower bound, or to the first two items in the list (check which one is bigger, and don't forget to check if the list has at least two items)

NomeN
Simple to code ... what more can you ask for? Admittedly doing a search is less "complex" but will almost certainly be slower due to memory access fun.
Goz
sorting is per definition slower, because you have to inspect elements at least O(n log n) times. This is guaranteed to only inspect elements O(n) times (where n is #elements of course). Only if the operation is to be repeated on the same list sorting becomes more efficient.
NomeN
This is what I ended up implementing (should've probably mentioned that in the OP) but I was looking for something different (and possibly faster).
Jacob
Sorting can be done in less lines, and retrieving largest/smallest elements is very efficient on an already sorted list, but you'll have to pay a larger cost up front to get it sorted. This however, is the most efficient possible solution if you only ever search for a set (in this case 2) number of largest items.
NomeN
A: 

You can scan the list in one pass and save the 1st and 2nd values, that has a O(n) efficiency while sorting is O(n log n).

EDIT:
I think that partial sort is O(n log k)

Shay Erlichmen
+1  A: 

The answer depends if you just want the values, or also iterators pointing at the values.

Minor modification of @will answer.

v::value_type second_best = 0, best = 0;
for(v::const_iterator i=v.begin(); i!=v.end(); ++i)
{
   if(*i > best)
   {
     second_best = best;
     best = *i;
   }
   else if (*i > second_best)
   {
     second_best = *i;
   }
}
Douglas Leeder
+4  A: 

nth_element(begin, begin+n,end,Compare) places the element that would be nth (where "first" is "0th") if the range [begin, end) were sorted at position begin+n and makes sure that everything from [begin,begin+n) would appear before the nth element in the sorted list. So the code you want is:

nth_element(container.begin(),
            container.begin()+1,
            container.end(),
            appropriateCompare);

This will work well in your case, since you're only looking for the two largest. Assuming your appropriateCompare sorts things from largest to smallest, the second largest element with be at position 1 and the largest will be at position 0.

Dan Hook
+1, nth_element has better runtime than partial_sort
Naveen
A: 

The optimal algorithm shouldn't need more than 1.5 * N - 2 comparisons. (Once we've decided that it's O(n), what's the coefficient in front of N? 2 * N comparisons is less than optimal).

So, first determine the "winner" and the "loser" in each pair - that's 0.5 * N comparisons.

Then determine the largest element by comparing winners - that's another 0.5 * N - 1 comparisons.

Then determine the second-largest element by comparing the loser of the pair where the largest element came from against the winners of all other pairs - another 0.5 * N - 1 comparisons.

Total comparisons = 1.5 N - 2.

azheglov
A: 

Untested but fun:

template <typename T, int n>
class top_n_functor : public unary_function<T, void>
{

  void operator() (const T& x) {
     auto f = lower_bound(values_.begin(), values_.end(), x);

     if(values_.size() < n) {
         values_.insert(f, x);
         return;
     }

     if(values_.begin() == f)
          return;

     auto removed = values_.begin();
     values_.splice(removed, values_, removed+1, f);

     *removed = x;
  }

  std::list<T> values() {
     return values_;
  }
private:
   std::list<T> values_;
};

int main()
{
  int A[] = {1, 4, 2, 8, 5, 7};
  const int N = sizeof(A) / sizeof(int);

  auto vals = for_each(A, A + N, top_n_functor<int,2>()).values();

  cout << "The top is " << vals.front()
       << " with second place being " << *(vals.begin()+1) << endl;
}
Todd Gardner
A: 

If the largest is the first element, search for the second largest in [largest+1,end). Otherwise search in [begin,largest) and [largest+1,end) and take the maximum of the two. Of course, this has O(2n), so it's not optimal.

If you have random-access iterators, you could do as quick sort does and use the ever-elegant recursion:

template< typename T >
std::pair<T,T> find_two_largest(const std::pair<T,T>& lhs, const std::pair<T,T>& rhs)
{
  // implementation finding the two largest of the four values left as an exercise :) 
}

template< typename RAIter >
std::pair< typename std::iterator_traits<RAIter>::value_type
         , typename std::iterator_traits<RAIter>::value_type > 
find_two_largest(RAIter begin, RAIter end)
{
  const ptr_diff_t diff = end-begin;
  if( diff < 2 )
    return std::make_pair(*begin, *begin);
  if( diff < 3 )
    return std::make_pair(*begin, *begin+1);
  const RAIter middle = begin + (diff)/2;
  typedef std::pair< typename std::iterator_traits<RAIter>::value_type
                   , typename std::iterator_traits<RAIter>::value_type > 
    result_t;
  const result_t left = find_two_largest(begin,middle);
  const result_t right = find_two_largest(middle,end);

  return find_two_largest(left,right);
}

This has O(n) and shouldn't make more comparisons than NomeN's implementation.

sbi
A: 

top k is usually a bit better than n(log k)

 template <class t,class ordering>
 class TopK {
 public:
    typedef std::multiset<t,ordering,special_allocator> BEST_t;
    BEST_t best;
    const size_t K;
    TopK(const size_t k)
        : K(k){
    } 
    const BEST_t& insert(const t& item){
        if(best.size()<k){
            best.insert(item);
            return best;
        }
        //k items in multiset now
        //and here is why its better - because if the distribution is random then
        //this and comparison above are usually the comparisons that is done; 
        if(compare(*best.begin(),item){//item better than worst
           erase(begin());//the worst
           best.insert(item); //log k-1 average as only k-1 items in best
        } 
        return best;
    } 
    template <class it>
    const BEST_t& insert(it i,const it last){
        for(;i!=last;++i){
            insert(*i);    
        }
        return best;
    }
  };

Of course the special_allocator can in essence be just an array of k multiset value_types and a list of those nodes (which typically has nothing on it as the other k are in use in the multiset until its time to put a new one in and we erase and then immediate ly reuse it. Good to have this or else the memory alloc/free in std::multiset and the cache line crap kills ya. Its a (very) tiny bit of work to give it static state without violating STL allocator rules.

Not as good as a specialized algo for exactly 2 but for fixed k<<n, I would GUESS (2n+delta*n) where delta is small - my DEK ACP vol3 S&S is packed away and an estimate on delta is a bit more work that I want to do.

average worst is I would guess n(log(k-1) + 2) when in opposite order and all distinct.

best is 2n + k(log k) for the k best being the first

pgast