tags:

views:

78

answers:

2

I want to learn about mapping functions in c/c++ in general so this is a basic program on unordered mapping. I use unordered mapping because my input data are not sorted and I read that unordered_map is very efficient. Here I've an array with which I'm creating the hash table and use the lookup function to find if the elements in another array are in the hash table or not. I've several questions regarding this implementation:

#include <stdio.h>
#include <unordered_map>
using namespace std;

typedef std::unordered_map<int,int> Mymap; 
int main()
{
int x,z,l=0;
int samplearray[5] = {0,6,4,3,8};
int testarray[10] = {6,3,8,67,78,54,64,74,22,77};

Mymap c1;

for ( x=0;x< sizeof(samplearray)/sizeof(int);x++)
 c1.insert(Mymap::value_type(samplearray[x], x));

for ( z=0;z< sizeof(testarray)/sizeof(int);z++)
 if((c1.find(testarray[z]) != c1.end()) == true)
  l++;

printf("The number of elements equal are : %d\n",l);
printf("the size of samplearray and testarray are : %d\t%d\n",sizeof(samplearray)/sizeof(int),sizeof(testarray)/sizeof(int));
}
  1. First of all, is this a right way to implement it? I'm getting the answers right but seems that I use too much of for loop.
  2. This seems fairly okay with very small data but if I'm dealing with files of size > 500MB then this seems that, if I create a hash table for a 500MB file then the size of the hash table itself will be twice as much which is 1000MB. Is this always the case?
  3. What is the difference between std::unordered map and boost::unordered map?

Finally, a small request. I'm new to C/C++ so if you are giving suggestions like using some other typedef/libraries, I'd highly appreciate if you could use a small example or implement it on my code. Thanks

+3  A: 

You're starting off on the wrong foot. A map (ordered or otherwise) is intended to store a key along with some associated data. In your case, you're only storing a number (twice, as both the key and the data). For this situation, you want a set (again, ordered or otherwise) instead of a map.

I'd also avoid at least the first for loop, and use std::copy instead:

// There are better ways to do this, but it'll work for now:
#define end(array) ((array) + (sizeof(array)/sizeof(array[0]))

std::copy(samplearray, 
          end(samplearray), 
          std::inserter(Myset));

If you only need to count how many items are common between the two sets, your for loop is fairly reasonable. If you need/want to actually know what items are common between them, you might want to consider using std::set_intersection:

std::set<int> myset, test_set, common;

std::copy(samplearray, end(samplearray), std::inserter(myset));
std::copy(testarray, end(testarray), std::inserter(test_set));

std::set_intersection(myset.begin(), myset.end(), 
                      test_set.begin(), test_set.end(), 
                      std::inserter(common));

// show the common elements (including a count):
std::cout <<common.size() << " common elements:\t";
std::copy(common.begin(), common.end(), 
          std::ostream_iterator<int>(std::cout, "\t");

Note that you don't need to have an actual set to use set_intersection -- all you need is a sorted collection of items, so if you preferred to you could just sort your two arrays, then use set_intersection on them directly. Likewise, the result could go in some other collection (e.g., a vector) if you prefer.

Jerry Coffin
Isn't range-constructing preferred (as in, `std::set<int> myset(samplearray, end(samplearray));`)?
Cubbi
@Cubbi: In the case of something like a vector, it's definitely preferred. In the case of a set, most such preference would be personal, not general. Depending on the source of the data, range-based construction often isn't suitable/possible though, and trying to teach when to use/avoid it would be a lot (probably too much) for a single answer...
Jerry Coffin
@jerry: I actually want to count the items. Set intersection would be an costly operation on large files in the size of > 10GB because of the sort function. Is it not?
Sunil
@jerry: Moreover searching an element on set seems to be logarithmic in size while unordered map is O(1), correct?
Sunil
@Sunil: yes, if you just want a count, `set_intersection` probably isn't a good choice. Yes, operations on `set` are logarithmic and on unordered_* are constant -- but that doesn't necessarily mean much. If it is a problem for your situation, consider using `unordered_set` (which I'd intended to mention, but looking back, apparently didn't at least directly). Keep in mind, however, that `set` tends to be more friendly to virtual memory than `unordered_set`, so if you're reaching the limits of memory, it may work better.
Jerry Coffin
@jerry: the copy line in your program works only when its like this `std::copy(samplearray, end(samplearray), std::inserter(myset),myset.end());`. Is this right? My concern is I do not see how the `myset.end()` works because when we start to inserting the first element the begin and the end are the same. If we are searching or doing someother operation I can understand the reason but why when insertion?
Sunil
@Sunil: Oops, yes, you're quite right that it needs that final parameter (though it should be: `std::inserter(myset, myset.end())`). As to why it needs it: mostly because they never provided an overload specifically for associative containers, and with a sequence it really needs to know where you want to insert the new items.
Jerry Coffin
A: 

As mentioned by Jerry, you could use a for loop for the search if you only need to know the number of matches. If that is the case, I would recommend using an unordered_set since you don't need the elements to be sorted.

Jaime Soto
Why would anybody use `unordered_map` when they have `unordered_set` doing the same function with a lesser space?
Sunil