views:

547

answers:

4

Let's say I need to retrieve the median from a sequence of 1000000 random numeric values.

If using anything but STL::list, I have no (built-in) way to sort sequence for median calculation.

If using STL::list, I can't randomly access values to retrieve middle (median) of sorted sequence.

Is it better to implement sorting myself and go with e.g. STL::vector, or is it better to use STL::list and use STL::list::iterator to for-loop-walk to the median value? The latter seems less overheadish, but also feels more ugly..

Or are there more and better alternatives for me?

+5  A: 

You can sort an std::vector using the library function std::sort.

std::vector<int> vec;
// ... fill vector with stuff
std::sort(vec.begin(), vec.end());
Charles Salvia
+11  A: 

Any random-access container (like std::vector) can be sorted with the standard std::sort algorithm, available in the <algorithm> header.

For finding the median, it would be quicker to use std::nth_element; this does enough of a sort to put one chosen element in the correct position, but doesn't completely sort the container. So you could find the median like this:

int median(vector<int> &v)
{
    int n = v.size() / 2;
    nth_element(v.begin(), v.begin()+n, v.end());
    return v[n];
}
Mike Seymour
Huh. I didn't realize that `nth_element` existed, I apparently re-implemented it in my answer...
ephemient
It should be noted that `nth_element` modifies the vector in unpredictable ways! You might want to sort a vector of indexes if necessary.
Matthieu M.
If the number of items is even, the median is the average of the middle *two*.
sje397
+1  A: 
ephemient
+1  A: 

The median is more complex than Mike Seymour's answer. The median differs depending on whether there are an even or an odd number of items in the sample. If there are an even number of items, the median is the average of the middle two items. This means that the median of a list of integers can be a fraction. Finally, the median of an empty list is undefined. Here is code that passes my basic test cases:

///Represents the exception for taking the median of an empty list
class median_of_empty_list_exception:public std::exception{
  virtual const char* what() const throw() {
    return "Attempt to take the median of an empty list of numbers.  "
      "The median of an empty list is undefined.";
  }
};

///Return the median of a sequence of numbers defined by the random
///access iterators begin and end.  The sequence must not be empty
///(median is undefined for an empty set).
///
///The numbers must be convertible to double.
template<class RandAccessIter>
double median(RandAccessIter begin, RandAccessIter end) 
  throw(median_of_empty_list_exception){
  if(begin == end){ throw median_of_empty_list_exception(); }
  std::size_t size = end - begin;
  std::size_t middleIdx = size/2;
  RandAccessIter target = begin + middleIdx;
  std::nth_element(begin, target, end);

  if(size % 2 != 0){ //Odd number of elements
    return *target;
  }else{            //Even number of elements
    double a = *target;
    RandAccessIter targetNeighbor= target-1;
    std::nth_element(begin, targetNeighbor, end);
    return (a+*targetNeighbor)/2.0;
  }
}
Eponymous