I have a vector of doubles. I want to sort it from highest to lowest, and get the indices of the top K elements. std::sort just sorts in place, and does not return the indices I believe. What would be a quick way to get the top K indices of largest elements?
Not sure about pre-canned algorithms, but take a look at selection algorithms; if you need the top K elements of a set of N values and N is much larger than K, there are much more efficient methods.
If you can create an indexing class (like @user470379's answer -- basically a class that encapsulates a pointer/index to the "real" data which is read-only), then use a priority queue of maximum size K, and add each unsorted element to the priority queue, popping off the bottom-most element when the queue reaches size K+1. In cases like N = 106, K = 100, this handles cases much more simply + efficiently than a full sort.
The first thing that comes to mind is somewhat hackish, but you could define a struct that stored both the double and its original index, then overload the < operator to sort based on the double:
struct s {
double d;
int index;
bool operator < (const struct &s) const {
return d < s.d;
}
};
Then you could retrieve the original indices from the struct.
Fuller example:
vector<double> orig;
vector<s> v;
...
for (int i=0; i < orig.size(); ++i) {
s s_temp;
s_temp.d = orig[i];
s_temp.index = i;
v.push_back(s);
}
sort(v.begin(), v.end());
//now just retrieve v[i].index
This will leave them sorted from smallest to largest, but you could overload the > operator instead and then pass in greater to the sort function if wanted.
OK, how about this?
bool isSmaller (std::pair<double, int> x, std::pair<double, int> y)
{
return x.first< y.first;
}
int main()
{
//...
//you have your vector<double> here, say name is d;
std::vector<std::pair<double, int> > newVec(d.size());
for(int i = 0; i < newVec.size(); ++i)
{
newVec[i].first = d[i];
newVec[i].second = i; //store the initial index
}
std::sort(newVec.begin(), newVec.end(), &isSmaller);
//now you can iterate through first k elements and the second components will be the initial indices
}
Use multimap
for vector
's (value, index) to handle dups. Use reverse iterators to walk results in descending order.
#include <multimap>
#include <vector>
using namespace std;
multimap<double, size_t> indices;
vector<double> values;
values.push_back(1.0);
values.push_back(2.0);
values.push_back(3.0);
values.push_back(4.0);
size_t i = 0;
for(vector<double>::const_iterator iter = values.begin();
iter != values.end(); ++iter, ++i)
{
indices.insert(make_pair<double,int>(*iter, i));
}
i = 0;
size_t limit = 2;
for (multimap<double, size_t>::const_reverse_iterator iter = indices.rbegin();
iter != indices.rend() && i < limit; ++iter, ++i)
{
cout << "Value " << iter->first << " index " << iter->second << endl;
}
Output is
Value 4 index 3
Value 3 index 2
If you just want the vector
indices after sort, use this:
#include <algorithm>
#include <vector>
using namespace std;
vector<double> values;
values.push_back(1.0);
values.push_back(2.0);
values.push_back(3.0);
values.push_back(4.0);
sort(values.rbegin(), values.rend());
The top K entries are indexed by 0 to K-1, and appear in descending order. This uses reverse iterators combined with standard sort
(using less<double>
to achieve descending order when iterated forward. Equivalently:
sort(values.rbegin(), values.rend(), less<double>());
Sample code for the excellent nth_element
solution suggested by @Kiril here (K = 125000, N = 500000). I wanted to try this out, so here it is.
vector<double> values;
for (size_t i = 0; i < 500000; ++i)
{
values.push_back(rand());
}
nth_element(values.begin(), values.begin()+375000, values.end());
sort(values.begin()+375000, values.end());
vector<double> results(values.rbegin(), values.rbegin() + values.size() - 375000);
you could use the nth_element
STL algorithm - this will return you the N greatest elements ( this is the fastest way,using stl ) and then use .sort on them,or you could use the partial_sort algorithm,if you want the first K elements to be sorted (:
Using just .sort is awful - it is very slow for the purpose you want.. .sort is great STL algorithm,but for sorting the whole container,not just the first K elements (; it's not an accident the existung of nth_element and partial_sort ;)
So you actually need a structure that maps indices to corresponding doubles.
You could use std::multimap
class to perform this mapping. As Jason have noted std::map
does not allow duplicate keys.
std::vector<double> v; // assume it is populated already
std::multimap<double, int> m;
for (int i = 0; i < v.size(); ++i)
m.insert(std::make_pair(v[i], i));
...
After you've done this you could iterate over first ten elements as map preserves sorting of keys to the elements.