I'm intersecting some sets of numbers, and doing this by storing a count of each time I see a number in a map.
I'm finding the performance be very slow.
Details: - One of the sets has 150,000 numbers in it - The intersection of that set and another set takes about 300ms the first time, and about 5000ms the second time - I haven't done any profiling yet, but every time I break the debugger while doing the intersection its in malloc.c!
So, how can I improve this performance? Switch to a different data structure? Some how improve the memory allocation performance of map?
Update:
- Is there any way to ask std::map or boost::unordered_map to pre-allocate some space?
- Or, are there any tips for using these efficiently?
Update2:
See http://stackoverflow.com/questions/1056366/fast-c-container-like-the-c-hashsett-and-dictionaryk-v
Update3:
I benchmarked set_intersection and got horrible results:
(set_intersection) Found 313 values in the intersection, in 11345ms
(set_intersection) Found 309 values in the intersection, in 12332ms
Code:
int runIntersectionTestAlgo()
{
set<int> set1;
set<int> set2;
set<int> intersection;
// Create 100,000 values for set1
for ( int i = 0; i < 100000; i++ )
{
int value = 1000000000 + i;
set1.insert(value);
}
// Create 1,000 values for set2
for ( int i = 0; i < 1000; i++ )
{
int random = rand() % 200000 + 1;
random *= 10;
int value = 1000000000 + random;
set2.insert(value);
}
set_intersection(set1.begin(),set1.end(), set2.begin(), set2.end(), inserter(intersection, intersection.end()));
return intersection.size();
}