ansaurus

Question

Answer 1

+1 A:

If you're using Visual Studio they have hash_multimap. I should also add that Boost has an unordered multimap, here. If you need an ordered multimap, STL multimap or ordered multiset STL multiset

wheaties 2010-04-01 14:38:28

I think the standard container should be the default option in most situations if nothing else because it is the standard container.

Billy ONeal 2010-04-01 15:04:51

pfft, standard containers. If people used those programs would crash less. Then people like me wouldn't have to explain to a boss why an unrelated section of code is impinging production of new functionality and we just got "lucky" it never happened before. Then we'd never have to hire high priced consultants to write "better" containers that didn't crash until they did. You see the cycle of life you're attempting to ruin here!?

wheaties 2010-04-01 15:41:40

See my solution.

Helltone 2010-04-01 16:02:09

@Helltone multimap holds things in order but can take multiple values per key. Inserts are O(log) while retrievals can be up to O(N) in time. I think you should go with multimap.

wheaties 2010-04-01 17:00:16

Answer 2

+4 A:

Can you reverse the order of the collection, i.e. store them in <value, key> order?

Then you could just use std::map having O(logn) time for insertion O(n) for removal (traversing whole collection) and O(logn) for random removal of value (which would be the key of said map).

If you could find a map implementation based on hashes instead of trees (like std::map) the times would be even better: O(1), O(n), O(1).

pajton 2010-04-01 14:39:58

+1. Or the nonstandard container `hash_map`/soon to be standard `std::unordered_map`.

Billy ONeal 2010-04-01 15:04:28

It needs to order and retrieve by key, so I'm not sure this would work.

Mark B 2010-04-01 18:53:29

@BillyONeal yes, `hashmap` would yield even better times `O(1)`, `O(n)`, `O(1)`.@MarkB he didn't state it anywhere, so why do you think so?

pajton 2010-04-01 19:35:25

@pajton: Actually, I like your answer better -- the standard containers should be used over nonstandard containers unless you have *very* good reason to do otherwise because the standard containers are just that -- standard. And the hashed containers do not offer O(1) worst case performance -- but they do on average. For some applications amortized analysis is not allowable practice.

Billy ONeal 2010-04-01 20:17:44

Answer 3

A:

std::multimap seem to be what you are searching for.

It will store your objects ordered by key, allow you to retrieve the lowest/highest key value (begin(), rbegin()) and all the object with a given key (equal_range, lower_bound, upper_bound).

(EDIT: if you have just a few items, say less than 30, you should also test the performance of just using a deque or a vector)

baol 2010-04-01 14:52:44

Answer 4

A:

If I understood well, you performance target is to have fast (1) and (3), and (2) is not that important. In this case, and given that values are unique, why not just have a std::set<value>, and do a sequential search for (2)? You'd have O(log n) for (1) and (3), and O(n) for (2). Better yet, if your STL has std::hash_set, you'd have close to O(1) for (1) and (3).

If you need something better than O(n) for (2), one alternative would be to have a set of priority queues.

Fabio Ceconello 2010-04-01 15:01:40

I said that the operations were **most common first**. My initial solution is better than yours because its O(1) for insertion.

Helltone 2010-04-01 15:05:41

I don't think you'll be able to speed up the other operations without giving up a bit in (1), there's no magic here. Anyway, if you have a std::hash_set implementation the insertion is quite close to O(1)

Fabio Ceconello 2010-04-01 15:13:32

Your solution `Helltone` has not be timed, so you don't know if it's better. Notably you forgot to consider the memory aspects of speed, like caching and heap usage. Unless you time it, you can't say for sure if it's faster. A big `O(1)` is worse than `O(log(n))` for small values of `n`.

Matthieu M. 2010-04-01 16:43:25

Answer 5

+3 A:

When you need order, use an ordered container. There is no point in paying the cost of sorting later on.

Your current solution is:

Insertion O(1)
Retrieval O(N log N)
Removal O(N) (which is as good as you can get without keeping another index there)

Simply using a std::multi_map you could have:

Insertion O(log N)
Retrieval O(log N) <-- much better isn't it ? We need to find the end of the range
Removal O(N)

Now, you could do slightly better with a std::map< key, std::vector<value> >:

Insertion O(log M) where M is the number of distinct keys
Retrieval O(1) (begin is guaranteed to be amortized constant time)
Removal O(N)

You can't really push the random removal... unless you're willing to keep another index there. For example:

typedef std::vector<value_type> data_value_t;
typedef std::map<key_type, data_value_t> data_t;

typedef std::pair<data_t::iterator,size_t> index_value_t;
  // where iterator gives you the right vector and size_t is an index in it

typedef std::unordered_map<value_type, index_value_t> index_t;

But keeping this second index up to date is error prone... and will be done at the expense of the other operations! For example, with this structure, you would have:

Insertion O(log M) --> complexity of insertion in hash map is O(1)
Retrieval O(N/M) --> need to de index all the values in the vector, there are N/M in average
Removal O(N/M) --> finding in hash map O(1), dereferencing O(1), removing from the vector O(N/M) because we need to shift approximately half the content of the vector. Using a list would yield O(1)... but might not be faster (depends on the number of elements because of the memory tradeoff).

Also bear in mind that hash map complexity are amortized ones. Trigger a reallocation because you overgrew the load factor, and this particular insertion will take a very long time.

I'd really go with the std::map<key_type, std::vector<value_type> > in your stead. That's the best bang for the buck.

Matthieu M. 2010-04-01 17:06:00

boost has multi-index container. So if you're going to go that route, use someone else's already debugged code.

Eric H. 2010-04-02 12:55:46

Unfortunately I don't know of a way to obtain the final structure with a `MultiIndex` container. I mean, you could ask for `multiset` on key and `hash_map` on values with unique constraint, but that's not the final design I devised. Probably good enough though.

Matthieu M. 2010-04-02 17:24:57

Answer 6

A:

Ok, so I've tested many options and ended up with something based on the idea of Matthieu M.. I'm currently using a std::map<key_type, std::list<value_type> >, where the value_type contains a std::list<value_type>::iterator to itself, which is useful for removal.

Removal must check if the vector is empty, which implies a map query and possibly a call to erase. Worst-case complexity is when keys are distinct, O(logN) for insertion, O(1) for retrieval and O(logN) for removal. I've got very good experimental results comparing to other alternatives on my test machine.

Using a std::vector is less efficient both in terms of theoretical complexity (O(N) worst-case for removal when keys are identical) and experimentation I've been doing.

Helltone 2010-04-02 12:24:10

ansaurus

tags:

views:

answers:

C++: A variation of priority queue

related questions