ansaurus

Question

Is this usage of unordered map efficient/right way?

Answer 1

+3 A:

You're starting off on the wrong foot. A map (ordered or otherwise) is intended to store a key along with some associated data. In your case, you're only storing a number (twice, as both the key and the data). For this situation, you want a set (again, ordered or otherwise) instead of a map.

I'd also avoid at least the first for loop, and use std::copy instead:

// There are better ways to do this, but it'll work for now:
#define end(array) ((array) + (sizeof(array)/sizeof(array[0]))

std::copy(samplearray, 
          end(samplearray), 
          std::inserter(Myset));

If you only need to count how many items are common between the two sets, your for loop is fairly reasonable. If you need/want to actually know what items are common between them, you might want to consider using std::set_intersection:

std::set<int> myset, test_set, common;

std::copy(samplearray, end(samplearray), std::inserter(myset));
std::copy(testarray, end(testarray), std::inserter(test_set));

std::set_intersection(myset.begin(), myset.end(), 
                      test_set.begin(), test_set.end(), 
                      std::inserter(common));

// show the common elements (including a count):
std::cout <<common.size() << " common elements:\t";
std::copy(common.begin(), common.end(), 
          std::ostream_iterator<int>(std::cout, "\t");

Note that you don't need to have an actual set to use set_intersection -- all you need is a sorted collection of items, so if you preferred to you could just sort your two arrays, then use set_intersection on them directly. Likewise, the result could go in some other collection (e.g., a vector) if you prefer.

Jerry Coffin 2010-10-06 17:13:19

Isn't range-constructing preferred (as in, `std::set<int> myset(samplearray, end(samplearray));`)?

Cubbi 2010-10-06 17:16:39

@Cubbi: In the case of something like a vector, it's definitely preferred. In the case of a set, most such preference would be personal, not general. Depending on the source of the data, range-based construction often isn't suitable/possible though, and trying to teach when to use/avoid it would be a lot (probably too much) for a single answer...

Jerry Coffin 2010-10-06 17:20:15

@jerry: I actually want to count the items. Set intersection would be an costly operation on large files in the size of > 10GB because of the sort function. Is it not?

Sunil 2010-10-06 17:54:02

@jerry: Moreover searching an element on set seems to be logarithmic in size while unordered map is O(1), correct?

Sunil 2010-10-06 18:02:29

@Sunil: yes, if you just want a count, `set_intersection` probably isn't a good choice. Yes, operations on `set` are logarithmic and on unordered_* are constant -- but that doesn't necessarily mean much. If it is a problem for your situation, consider using `unordered_set` (which I'd intended to mention, but looking back, apparently didn't at least directly). Keep in mind, however, that `set` tends to be more friendly to virtual memory than `unordered_set`, so if you're reaching the limits of memory, it may work better.

Jerry Coffin 2010-10-06 18:18:16

@jerry: the copy line in your program works only when its like this `std::copy(samplearray, end(samplearray), std::inserter(myset),myset.end());`. Is this right? My concern is I do not see how the `myset.end()` works because when we start to inserting the first element the begin and the end are the same. If we are searching or doing someother operation I can understand the reason but why when insertion?

Sunil 2010-10-06 19:32:32

@Sunil: Oops, yes, you're quite right that it needs that final parameter (though it should be: `std::inserter(myset, myset.end())`). As to why it needs it: mostly because they never provided an overload specifically for associative containers, and with a sequence it really needs to know where you want to insert the new items.

Jerry Coffin 2010-10-06 19:37:53

Answer 2

A:

As mentioned by Jerry, you could use a for loop for the search if you only need to know the number of matches. If that is the case, I would recommend using an unordered_set since you don't need the elements to be sorted.

Jaime Soto 2010-10-06 17:30:50

Why would anybody use `unordered_map` when they have `unordered_set` doing the same function with a lesser space?

Sunil 2010-10-06 18:24:41

ansaurus

tags:

views:

answers:

Is this usage of unordered map efficient/right way?

related questions