ansaurus

Question

Find largest and second largest element in a range

Answer 1

A:

Create a sublist from n..m, sort it descending. Then grab the first two elements. Delete these elements from the orginal list.

Byron Whitlock 2009-09-11 19:11:26

Answer 2

+17 A:

using partial_sort ?

std::partial_sort(aTest.begin(), aTest.begin() + 2, aTest.end(), Functor);

An Example:

std::vector<int> aTest;

    aTest.push_back(3);
    aTest.push_back(2);
    aTest.push_back(4);
    aTest.push_back(1);


    std::partial_sort(aTest.begin(), aTest.begin()+2,aTest.end(), std::greater<int>());

    int Max = aTest[0];
int SecMax = aTest[1];

aJ 2009-09-11 19:12:02

+1 I didn't know about partial_Sort. BUT doesn't work if he wants to keep the order of the original list.

Byron Whitlock 2009-09-11 19:13:10

You can use partial_sort_copy for that

aJ 2009-09-11 19:16:26

won't this get you the smallest elements? It's easy enough to fix by using a different comparison operator. +1

rmeador 2009-09-11 19:22:43

Yes, I was writing the same. Thanks

aJ 2009-09-11 19:23:49

And if you're streaming a large number of values through, say from an input_iterator and don't want to store the entire range, only the top elements, then the heap operations -- make_heap, push_heap and sort_heap -- are the way to go.

Boojum 2009-09-11 19:24:54

What's the difference from just `std::sort(aTest.begin(), aTest.end());`? aTest[0] - largest, aTest[1] - second largest.

Kirill V. Lyadvinsky 2009-09-11 19:58:11

partial_sort has the special ability to stop sorting when only the first n elements need to be sorted. (Second argument)

aJ 2009-09-11 20:00:28

+1 It could be faster that just `std::sort`.

Kirill V. Lyadvinsky 2009-09-11 20:06:40

Any idea about the time complexity?

Jacob 2009-09-14 13:07:42

From Josuttis --- Partial_sort() guarantees n * log(n) complexity in any case. Normally between linear and n-log-n (approximately numberOfElements*log(numberOfSortedElements) comparisons).

aJ 2009-09-14 16:19:41

Answer 3

+2 A:

Lets assume you mean to find the two largest unique values in the list.

If the list is already sorted, then just look at the second last element (or rather, iterate from the end looking for the second last value).

If the list is unsorted, then don't bother to sort it. Sorting is at best O(n lg n). Simple linear iteration is O(n), so just loop over the elements keeping track:

v::value_type second_best = 0, best = 0;
for(v::const_iterator i=v.begin(); i!=v.end(); ++i)
   if(*i > best) {
     second_best = best;
     best = *i;
   } else if(*i > second_best) {
     second_best = *i;
   }

There are of course other criteria, and these could all be put into the test inside the loop. However, should you mean that two elements that both have the same largest value should be found, you have to consider what happens should three or more elements all have this largest value, or if two or more elements have the second largest.

Will 2009-09-11 19:13:01

Except that this does not handle duplicates in the list

ezpz 2009-09-11 19:14:58

Need an else block if second_best < *i < best

Douglas Leeder 2009-09-11 19:21:21

I don't need to find unique values, duplicates are fine - I've updated this in the question.

Jacob 2009-09-14 13:00:17

Answer 4

+15 A:

for (e: all elements) {
 if (e > largest) {
   second = largest;
   largest = e;
 } else if (e > second) {
   second = e;
 }
}

You could either initialize largest and second to an appropriate lower bound, or to the first two items in the list (check which one is bigger, and don't forget to check if the list has at least two items)

NomeN 2009-09-11 19:13:57

Simple to code ... what more can you ask for? Admittedly doing a search is less "complex" but will almost certainly be slower due to memory access fun.

Goz 2009-09-11 20:13:45

sorting is per definition slower, because you have to inspect elements at least O(n log n) times. This is guaranteed to only inspect elements O(n) times (where n is #elements of course). Only if the operation is to be repeated on the same list sorting becomes more efficient.

NomeN 2009-09-11 20:52:36

This is what I ended up implementing (should've probably mentioned that in the OP) but I was looking for something different (and possibly faster).

Jacob 2009-09-14 13:06:21

Sorting can be done in less lines, and retrieving largest/smallest elements is very efficient on an already sorted list, but you'll have to pay a larger cost up front to get it sorted. This however, is the most efficient possible solution if you only ever search for a set (in this case 2) number of largest items.

NomeN 2009-09-14 14:02:01

Answer 5

A:

You can scan the list in one pass and save the 1st and 2nd values, that has a O(n) efficiency while sorting is O(n log n).

EDIT:
I think that partial sort is O(n log k)

Shay Erlichmen 2009-09-11 19:18:39

Answer 6

+1 A:

The answer depends if you just want the values, or also iterators pointing at the values.

Minor modification of @will answer.

v::value_type second_best = 0, best = 0;
for(v::const_iterator i=v.begin(); i!=v.end(); ++i)
{
   if(*i > best)
   {
     second_best = best;
     best = *i;
   }
   else if (*i > second_best)
   {
     second_best = *i;
   }
}

Douglas Leeder 2009-09-11 19:25:58

Answer 7

+4 A:

nth_element(begin, begin+n,end,Compare) places the element that would be nth (where "first" is "0th") if the range [begin, end) were sorted at position begin+n and makes sure that everything from [begin,begin+n) would appear before the nth element in the sorted list. So the code you want is:

nth_element(container.begin(),
            container.begin()+1,
            container.end(),
            appropriateCompare);

This will work well in your case, since you're only looking for the two largest. Assuming your appropriateCompare sorts things from largest to smallest, the second largest element with be at position 1 and the largest will be at position 0.

Dan Hook 2009-09-11 19:27:02

+1, nth_element has better runtime than partial_sort

Naveen 2009-09-14 10:49:38

Answer 8

A:

The optimal algorithm shouldn't need more than 1.5 * N - 2 comparisons. (Once we've decided that it's O(n), what's the coefficient in front of N? 2 * N comparisons is less than optimal).

So, first determine the "winner" and the "loser" in each pair - that's 0.5 * N comparisons.

Then determine the largest element by comparing winners - that's another 0.5 * N - 1 comparisons.

Then determine the second-largest element by comparing the loser of the pair where the largest element came from against the winners of all other pairs - another 0.5 * N - 1 comparisons.

Total comparisons = 1.5 N - 2.

azheglov 2009-09-11 19:39:22

Answer 9

A:

Untested but fun:

template <typename T, int n>
class top_n_functor : public unary_function<T, void>
{

  void operator() (const T& x) {
     auto f = lower_bound(values_.begin(), values_.end(), x);

     if(values_.size() < n) {
         values_.insert(f, x);
         return;
     }

     if(values_.begin() == f)
          return;

     auto removed = values_.begin();
     values_.splice(removed, values_, removed+1, f);

     *removed = x;
  }

  std::list<T> values() {
     return values_;
  }
private:
   std::list<T> values_;
};

int main()
{
  int A[] = {1, 4, 2, 8, 5, 7};
  const int N = sizeof(A) / sizeof(int);

  auto vals = for_each(A, A + N, top_n_functor<int,2>()).values();

  cout << "The top is " << vals.front()
       << " with second place being " << *(vals.begin()+1) << endl;
}

Todd Gardner 2009-09-11 19:40:37

Answer 10

A:

If the largest is the first element, search for the second largest in [largest+1,end). Otherwise search in [begin,largest) and [largest+1,end) and take the maximum of the two. Of course, this has O(2n), so it's not optimal.

If you have random-access iterators, you could do as quick sort does and use the ever-elegant recursion:

template< typename T >
std::pair<T,T> find_two_largest(const std::pair<T,T>& lhs, const std::pair<T,T>& rhs)
{
  // implementation finding the two largest of the four values left as an exercise :) 
}

template< typename RAIter >
std::pair< typename std::iterator_traits<RAIter>::value_type
         , typename std::iterator_traits<RAIter>::value_type > 
find_two_largest(RAIter begin, RAIter end)
{
  const ptr_diff_t diff = end-begin;
  if( diff < 2 )
    return std::make_pair(*begin, *begin);
  if( diff < 3 )
    return std::make_pair(*begin, *begin+1);
  const RAIter middle = begin + (diff)/2;
  typedef std::pair< typename std::iterator_traits<RAIter>::value_type
                   , typename std::iterator_traits<RAIter>::value_type > 
    result_t;
  const result_t left = find_two_largest(begin,middle);
  const result_t right = find_two_largest(middle,end);

  return find_two_largest(left,right);
}

This has O(n) and shouldn't make more comparisons than NomeN's implementation.

sbi 2009-09-11 20:01:48

Answer 11

A:

top k is usually a bit better than n(log k)

 template <class t,class ordering>
 class TopK {
 public:
    typedef std::multiset<t,ordering,special_allocator> BEST_t;
    BEST_t best;
    const size_t K;
    TopK(const size_t k)
        : K(k){
    } 
    const BEST_t& insert(const t& item){
        if(best.size()<k){
            best.insert(item);
            return best;
        }
        //k items in multiset now
        //and here is why its better - because if the distribution is random then
        //this and comparison above are usually the comparisons that is done; 
        if(compare(*best.begin(),item){//item better than worst
           erase(begin());//the worst
           best.insert(item); //log k-1 average as only k-1 items in best
        } 
        return best;
    } 
    template <class it>
    const BEST_t& insert(it i,const it last){
        for(;i!=last;++i){
            insert(*i);    
        }
        return best;
    }
  };

Of course the special_allocator can in essence be just an array of k multiset value_types and a list of those nodes (which typically has nothing on it as the other k are in use in the multiset until its time to put a new one in and we erase and then immediate ly reuse it. Good to have this or else the memory alloc/free in std::multiset and the cache line crap kills ya. Its a (very) tiny bit of work to give it static state without violating STL allocator rules.

Not as good as a specialized algo for exactly 2 but for fixed k<<n, I would GUESS (2n+delta*n) where delta is small - my DEK ACP vol3 S&S is packed away and an estimate on delta is a bit more work that I want to do.

average worst is I would guess n(log(k-1) + 2) when in opposite order and all distinct.

best is 2n + k(log k) for the k best being the first

pgast 2009-09-25 20:50:02

ansaurus

tags:

views:

answers:

Find largest and second largest element in a range

related questions