ansaurus

Question

Answer 1

+2 A:

Depending on what complexity do you need you have two options.

First one is to use multimap container to keep values and iteratively use equal_range to generate output. Here you'll get fast insert, but slow output.

Second option is to use boost::multi_index with member functions as indices that will calculate sum and count values on insert. Here you'll get slow insert, but fast output.

Kirill V. Lyadvinsky 2009-11-24 06:53:55

Answer 2

+2 A:

Following is the sample code to implement this using std::multimap and std::map and then using equal_range with count method of these classes.

    std::map<int, Node> res;
    typedef std::multimap<int, int> MultiMap;
    MultiMap mMap;
    mMap.insert(std::make_pair(1,3));
    mMap.insert(std::make_pair(3,27));
    mMap.insert(std::make_pair(7,7));
    mMap.insert(std::make_pair(3,2));
    mMap.insert(std::make_pair(1,1));


std::multimap<int, int>::iterator iter = mMap.begin();
std::multimap<int, int>::iterator endIter = mMap.end();
while( iter != endIter)
{
 int val = iter->first;
 std::pair<MultiMap::iterator, MultiMap::iterator> iterPair = mMap.equal_range(val);
 Node n;
 n.val = val;
 n.count = mMap.count(val);

 int size = 0;
 for(; iterPair.first != iterPair.second; ++iterPair.first)
 {
  size += iterPair.first->second;   
 }

 res[size] = n;

 iter = iterPair.second;
}

Node is defined as:

struct Node
{
    int val;
    int count;

    Node() : val(0), count(0)
    {
    }
};

Note that the key for the result map is size and not count.

Naveen 2009-11-24 06:56:37

Thank you for the code. This really helped narrow down the fact that map/vector/sort performs better in my specific case. I'd upvote you, but I don't have the rep. =/

LCC 2009-11-24 08:11:16

Answer 3

A:

One thing to note: In VS2008, at least, when you assign map[key] = Node(size);, you actually end up constructing three separate Node instances. The upshot is that only the one you declare gets created on the stack - the other two are created on the heap, so by using this version, you're actually incurring twice the overhead you would if you used a pointer and assumed responsibility for deleting all your instances at the end.

Dathan 2009-11-24 06:57:06

Answer 4

+6 A:

A multimap<> may help you.

There are multiple entries in the data set with the same key.
multimap<> can handle duplicate keys, maps cannot.

How many of each key is there in the data set
multimap<>::count() takes a key and returns the number of matching elements.

For each key what is the total size of them
multimap<>::equal_range() takes a key and returns a std::pair< multimap<>::iterator, multimap<>::iterator >, where the first iterator is the first element matching key and the second is the last. they can be iterated as thought they were begin and end. So using those it would be a simple loop to calculate the total size for each key.

Obviously, it doesn't entirely suit your needs, and if you are going to be operating on large data sets perhaps you would gain some valuable performance implementing a custom container. Good luck!

0xC0DEFACE 2009-11-24 07:05:39

Using Naveen's code below it looks as if map/vector/sort is faster due to the fact that my total number of unique keys is low compared to the total number of entries in the data set. Testing this against a multimap implementation gave me more confidence that the original solution I came up with should be good enough for my purposes. Thanks for the info!

LCC 2009-11-24 08:13:30

Answer 5

+2 A:

If you can use Boost you can make use of Boost.Multiindex. It allows you to have a container with two ordered indexes (in your example an index by key and an index by size). As for memory efficiencies or inefficiencies it is said that in Boost.Multiindex ordered indices node compression was implemented and the result is that:

Size of ordered indices node headers have been reduced by 25% on most platforms

Also take a look at this example and its result: Results for 2 ordered indices. So even when you simply use boost::multiindex with ordered indexes it uses less memory than std::multiset from MS VS 8.0 or gcc.

As for your solution I think you can expect that Boost.Multiindex will use less memory comparing to your implementation. However if you want to compare exactly two solutions you can do this. Write your own counting allocator, add it to your containers and find out how much memory has been used. Then do the same thing using Boost.Multiindex with your counting allocator. This is an example of allocator. You need to modify it slightly in order to count number bytes that has benn allocated and deallocated.

skwllsp 2009-11-24 07:15:40

Answer 6

+2 A:

std::map is a associative container so the map will be in sorted order with respect to key. And here since you are using duplicate keys so multimap will solve your purpose.

Vivek 2009-11-24 07:38:57

Answer 7

A:

stl::map<Key, Data, Compare, Alloc> have Compare, just give a function for weak ordering there.

struct Node
{
    int Size;
    int Count;
};

bool compareNode(const Node& a, const Node& b) {
  return a.Size < b.Size;
}

stl::map<Node, stlstring, compareNode>  xxx;

J-16 SDiZ 2009-11-24 08:14:55

Answer 8

+1 A:

Here a small improvement of your solution. Shorter and avoid the second key search when inserting a new one (note that it relies on the 0s of your Node constructor)

void map_insert(std::map<int, Node> &map, int key, int size) {
 Node & n = map[key];
 ++n.Count;
 n.Size+=size;
}

But the optimal way probably depends of the range of your keys. If always small (let's say 1..1000), a simple vector is the best choice. If bigger, a hash_map gives better result, because you doesn't seem to need the key ordering (used by map).

I tested and it seems to give sensible improvement for your ~1000 keys case, but it also depends of your key distribution. You just need to replace std::map by std::hash_map and then fix header stuff. However, std::hash_map may have some portability problem. You could still write your own hashing system, though (and even adapt it to your key distribution).

EDIT: unordered_map seems to be the future standard for hash_map. At least, if fixes deprecation warning on gcc 4.3.

Alink 2009-11-24 17:35:16

Ah good call. I missed the better insertion pattern and hash_map/unordered_map. Thanks!

LCC 2009-11-24 18:27:47

ansaurus

tags:

views:

answers:

Sorting an std::map by a different key?

related questions