tags:

views:

173

answers:

10

I am using a std::map. Sometimes I will do an operation like: finding the median value of all items. e.g if I add

1 "s"
2 "sdf"
3 "sdfb"
4 "njw"
5 "loo"

then the median is 3.

Is there some solution without iterating over half the items in the map?

A: 

If you know the map will be sorted, then get the element at floor(length / 2). If you're in a bit twiddly mood, try (length >> 1).

Zach Rattner
OP: *"Is that some solution without iterate half of items in map?"*
Luther Blissett
@Ido: count from 0 for indexing, this is CS. `floor(len/2) = 2`, and 2 is the index of the 3rd element.
Amber
This is not the correct formula for the median.
InsertNickHere
Please don't ever do bit-shifts instead of arithmetic.
GMan
+1  A: 

In self balancing binary tree(std::map is one I think) a good approximation would be the root.
For exact value just cache it with a balance indicator, and each time an item added below the median decrease the indicator and increase when item is added above. When indicator is equal to 2/-2 move the median upwards/downwards one step and reset the indicator.

Dani
Good idea. Most of time, approximation is ok.This means i should rewrite map, and keep indicator?if adjust indicator in each insertion operation, that is inefficient, because there will be map.size() comparison.
Raymond
You dont have to rewrite map, you can subclass it. and why inefficient? this is O(1) method on instertion.
Dani
you are right, i can subclass it, and set a member variable to save the indicator. thank you, this is also an optional solution.
Raymond
STL classes aren't designed to be subclassed. See http://stackoverflow.com/questions/1647298
Tom Sirgedas
+1  A: 

If you can switch data structures, store the items in a std::vector and sort it. That will enable accessing the middle item positionally without iterating. (It can be surprising but a sorted vector often out-performs a map, due to locality. For lookups by the sort key you can use binary search and it will have much the same performance as a map anyway. See Scott Meyer's Effective STL.)

Daniel Earwicker
sorry, i cannot switch to vector, because items in map is inserted dynamicly.
Raymond
You can `push_back` on the vector, and then sort later. Try timing it - the total cost of all the operations may be less.
Daniel Earwicker
BTW. there's a nice function `nth_element` with sorts RandomAccessIterator's until the first n element are sorted. You'd find the median median with `nth_element(v.begin(),v.begin+v.size()/2,v.end());`
Luther Blissett
@Raymond You cant compute the median of a non sorted set of values. The term is only defined for sorded values. So you have to sort your map anyway...
InsertNickHere
@InsertNickHere A map _is_ sorted.
chrispy
So why do you all talk about sorting? :)
InsertNickHere
The reason sorting is relevant is because this answer suggests using a `vector` instead of a `map`. A `vector` can be put into a sorted state (it has a `sort` method) but then will likely become unsorted as soon as you modify its contents. So if you use a `vector`, you lose automatic sorting, but you gain (a) control over when and how sorting is done, (b) constant-time access to the median position, (b) greater locality: your data in a contiguous chunk instead of scattered around the address space, which often improves performance.
Daniel Earwicker
Getting median will be called only several times, but inserting is common case, so i vector is not a suitable container.
Raymond
@Raymond - I may be misunderstanding you, but if you insert many times *without* yet requiring the vector to be sorted, then you don't have to sort after every insert. You can do a single sort when you need the median. As as Luther Blissett pointed out, you don't even need to fully sort it.
Daniel Earwicker
my scenario is: insert item and query item frequently, so i must ensure the items is sorted at any time. and sometimes, i need retrive median value of the whole items. thank you.
Raymond
+1  A: 

Try:

typedef std::map<int,std::string>  Data;
Data           data;
Data::iterator median = std::advance(data.begin(), data.size() / 2); 

Works if the size() is odd. I'll let you work out how to do it when size() is even.

Martin York
This is the cleanest way to get the median. However, you are still iterating through half of the elements because std::map::iterator is a bidirectional iterator and incapable of random access.
A. Levy
Yep. But not much choice when your data structure is map. But if the data structure is ever changed to any other container type the std::advance() will still work and automatically provide the most efficient method of advancing.
Martin York
This equalsiter = map.begin();while ( i < N/2) ++iter;the performance is poor.
Raymond
Martin, I agree whole-heartedly! You solution is the first thing I would try if I had to use a map. I just had to be pedantic and point out that the solution doesn't quite satisfy the OP's criteria.
A. Levy
@Raymond: Poor compared to what. The complexity is O(n) and unless you use vector you are not going to get better.
Martin York
@Raymond: Poor is a comparative term. There's no other way. Have you actually *profiled* anything and determined this method will be "too slow"? If not, just get your code working in a straightforward fashion first. If your profiling results you spend most of your time here, cache the value for an approximation, or try different structures (like `vector`).
GMan
I am sorry, I just mean "iterate the former half items" is my first solution, and through profiling, i find i should make a optimization.
Raymond
A: 

For a sortet list, here it is in java code, but i assume, its very easy to port to c++:

    if (input.length % 2 != 0) {
        return input[((input.length + 1) / 2 - 1)];
    } else {
        return 0.5d * (input[(input.length / 2 - 1)] + input[(input.length / 2 + 1) - 1]);
    }
InsertNickHere
STL's map does not offer random access.
Peter G.
+5  A: 

I think the answer is no. You cannot just jump to the N / 2 item past the beginning because a std::map uses bidirectional iterators. You must iterate through half of the items in the map. If you had access to the underlying Red/Black tree implementation that is typically used for the std::map, you might be able to get close like in Dani's answer. However, you don't have access to that as it is encapsulated as an implementation detail.

A. Levy
+1 Give this man a cookie. To operate in constant time, it would have to support random access iterators, or the underlying datatype, both of which std::map does not provide.
Merlyn Morgan-Graham
I love cookies! Thanks Merlyn!
A. Levy
A: 

The nth_element() method is there for you for this :) It implements the partition part of the quick sort and you don't need your vector (or array) to be sorted. And also the time complexity is O(n) (while for sorting you need to pay O(nlogn)).

Atul
OP asks about `std::map`, not about `std::vector`.
Kirill V. Lyadvinsky
but if you look closely, actually his data structure is a vector. in map keys are sorted not values.
Atul
+2  A: 

I think you can solve the problem by using two std::map. One for smaller half of items (mapL) and second for the other half (mapU). When you have insert operation. It will be either case:

  • add item to mapU and move smallest element to mapL
  • add item to mapL and move greatest element to mapU

In case the maps have different size and you insert element to the one with smaller number of elements you skip the move section. The basic idea is that you keep your maps balanced so the maximum size difference is 1 element. As far as I know STL all operations should work in O(ln(n)) time. Accessing smallest and greatest element in map can be done by using iterator. When you have n_th position query just check map sizes and return greatest element in mapL or smallest element in mapR.

The above usage scenario is for inserting only but you can extend it to deleting items as well but you have to keep track of which map holds item or try to delete from both.

Here is my code with sample usage:

#include <iostream>
#include <string>
#include <map>
using namespace std;

typedef pair<int,string> pis;
typedef map<int,string>::iterator itis;

map<int,string>Left;
map<int,string>Right;

itis get_last(map<int,string> &m){
    return (--m.end());
}

int add_element(int key, string val){
    if (Left.empty()){
        Left.insert(make_pair(key,val));
        return 1;
    }

    pis maxl = *get_last(Left);
    if (key <= maxl.first){
        Left.insert(make_pair(key,val));
        if (Left.size() > Right.size() + 1){
            itis to_rem = get_last(Left);
            pis cpy = *to_rem;
            Left.erase(to_rem);
            Right.insert(cpy);
        }
        return 1;
    } else {
        Right.insert(make_pair(key,val));
        if (Right.size() > Left.size()){
            itis to_rem = Right.begin();
            pis cpy = *to_rem;
            Right.erase(to_rem);
            Left.insert(*to_rem);
        }
        return 2;
    }   
}

pis get_mid(){
    int size = Left.size() + Right.size();
    if (Left.size() >= size / 2){
        return *(get_last(Left));
    }
    return *(Right.begin());
}

int main(){
    Left.clear();
    Right.clear();

    int key;
    string val;
    while (!cin.eof()){
        cin >> key >> val;
        add_element(key,val);
        pis mid = get_mid();
        cout << "mid " << mid.first << " " << mid.second << endl;
    }
}
jethro
+1 this is a pretty clever solution! You could easily build on this and encapsulate it in a class that could be used as a replacement for `std::map` but with an additional `median()` method. Good thinking!
A. Levy
cool!! This sulotion is very nice. THANK YOU.
Raymond
A: 

I know no way to get the median from a pure STL map quickly for big maps. If your map is small or you need the median rarely you should use the linear advance to n/2 anyway I think - for the sake of simplicity and being standard.

You can use the map to build a new container that offers median: Jethro suggested using two maps, based on this perhaps better would be a single map and a continuously updated median iterator. These methods suffer from the drawback that you have to reimplement every modifiying operation and in jethro's case even the reading operations.

A custom written container will also do what you what, probably most efficiently but for the price of custom code. You could try, as was suggested to modify an existing stl map implementation. You can also look for existing implementations.

There is a super efficient C implementation that offers most map functionality and also random access called Judy Arrays. These work for integer, string and byte array keys.

Peter G.
COOL, this is helpful for implementation of private map. thank you.
Raymond
A: 

Since it sounds like insert and find are your two common operations while median is rare, the simplest approach is to use the map and std::advance( m.begin(), m.size()/2 ); as originally suggested by David Rodríguez. This is linear time, but easy to understand so I'd only consider another approach if profiling shows that the median calls are too expensive relative to the work your app is doing.

Mark B