ansaurus

Question

Answer 1

+4 A:

Use this...

vector<int> set1(10000);
vector<int> set2(1000);

... to get vectors of non-zero initial size. Then don't use push_back, but just update the values directly.

Roddy 2009-06-29 21:32:30

Or call reserve(1000), but continue using push_back

GMan 2009-06-29 21:41:40

or just don't include this in your timing in the first place. Your setup code should not be part of the benchmark.

jalf 2009-06-29 21:49:39

I want to include the setup time, because my data is not in one of these data structures at the start... so if the approach requires me to put it into a data structure then I need to include that tme.

Alex Black 2009-06-29 22:00:52

@jalf. Fair point. I was assuming he actually wanted to measure that but as well...

Roddy 2009-06-29 22:01:06

I agree that there is a vast difference between initializing something to 10000 bytes for the C# and doing multiple re-allocs on the C++ side.

Kevin 2009-06-29 22:02:47

nevertheless, it is not part of this test. You'll get much more useful benchmarking data by treating them as separate problems. First worry about the intersection performance, and then do a separate test for the "build initial data structures" part.

jalf 2009-06-29 22:02:58

I modified the C++ implementation that uses ordered_set and vector to call reserve() on the vectors. Not much change.

Alex Black 2009-06-29 22:18:05

Answer 2

A:

I know your solution is working fine, but have you tried using the STL implementations:

It might be optimized for your plataform already, so I'd give it a shot

Edison Gustavo Muenz 2009-06-29 21:48:58

My C++ code includes an implementation using set_intersection. I will take a look at "includes".

Alex Black 2009-06-29 22:00:19

Answer 3

+19 A:

There are several problems with your test.

First, you are not testing set intersection, but "create a couple of arrays, fill them with random numbers, and then perform set intersection". You should only time the portion of the code you're actually interested in. Even if you're going to want to do those things, they should not be benchmarked here. Measure one thing at a time, to reduce uncertainty. If you want your C++ implementation to perform better, you first need to know which part of it is slower than expected. Which means you have to separate setup code from intersection test.

Second, you should run the test a large number of times to take possible caching effects and other uncertainties into account. (And probably output one total time for, say, 1000 runs, rather than an individual time for each. That way you reduce the uncertainty from the timer which might have limited resolution and report inaccurate results when used in the 0-20ms range.

Further, as far as I can read from the docs, the input to set_intersection should be sorted, which set2 won't be. An there seems to be no reason to use unordered_map, when unordered_set would be a far better match for what you're doing.

About the setup code being needed, note that you probably don't need to populate vectors in order to run the intersection. Both your own implementation and set_intersection work on iterators already, so you can simply pass them a pair of iterators to the data structures your inputs are in already.

A few more specific comments on your code:

Use ++iterator instead of iterator++
rather than calling vector.end() at each loop iteration, call it once and cache the result
experiment with using sorted vectors vs std::set vs unordered_set (not unordered_map)

Edit:

I haven't tried your C# version, so I can't compare the numbers properly, but here's my modified test. Each is run 1000 times, on a Core 2 Quad 2.5GHz with 4GB RAM:

std::set_intersection on std::set: 2606ms
std::set_intersection on tr1::unordered_set: 1014ms
std::set_intersection on sorted vectors: 171ms
std::set_intersection on unsorted vectors: 10140ms

The last one is a bit unfair, because it has to both copy and sort the vectors. Ideally, only the sort should be part of the benchmark. I tried creating a version that used an array of 1000 unsorted vectors (so I woudln't have to copy the unsorted data in each iteration), but the performance was about the same, or a bit worse, because this would cause constant cache misses, so I reverted back to this version

And my code:

#define _SECURE_SCL 0

#include <ctime>
#include <vector>
#include <set>
#include <iostream>
#include <algorithm>
#include <unordered_set>
#include <windows.h>

template <typename T, typename OutIter>
void stl_intersect(const T& set1, const T& set2, OutIter out){
    std::set_intersection(set1.begin(), set1.end(), set2.begin(), set2.end(), out);
}

template <typename T, typename OutIter>
void sort_stl_intersect(T& set1, T& set2, OutIter out){
    std::sort(set1.begin(), set1.end());
    std::sort(set2.begin(), set2.end());
    std::set_intersection(set1.begin(), set1.end(), set2.begin(), set2.end(), out);
}


template <typename T>
void init_sorted_vec(T first, T last){
    for ( T cur = first; cur != last; ++cur)
    {
     int i = cur - first;
     int value = 1000000000 + i;
     *cur = value;
    }
}

template <typename T>
void init_unsorted_vec(T first, T last){
    for ( T cur = first; cur != last; ++cur)
    {
     int i = rand() % 200000 + 1;
     i *= 10;

     int value = 1000000000 + i;
     *cur = value;
    }
}

struct resize_and_shuffle {
    resize_and_shuffle(int size) : size(size) {}

    void operator()(std::vector<int>& vec){
     vec.resize(size);

    }
    int size;
};

int main()
{
    srand ( time(NULL) );
    std::vector<int> out(100000);

    std::vector<int> sortedvec1(100000);
    std::vector<int> sortedvec2(1000);

    init_sorted_vec(sortedvec1.begin(), sortedvec1.end());
    init_unsorted_vec(sortedvec2.begin(), sortedvec2.end());
    std::sort(sortedvec2.begin(), sortedvec2.end());

    std::vector<int> unsortedvec1(sortedvec1.begin(), sortedvec1.end());
    std::vector<int> unsortedvec2(sortedvec2.begin(), sortedvec2.end());

    std::random_shuffle(unsortedvec1.begin(), unsortedvec1.end());
    std::random_shuffle(unsortedvec2.begin(), unsortedvec2.end());

    std::vector<int> vecs1[1000];
    std::vector<int> vecs2[1000];

    std::fill(vecs1, vecs1 + 1000, unsortedvec1);
    std::fill(vecs2, vecs2 + 1000, unsortedvec2);

    std::set<int> set1(sortedvec1.begin(), sortedvec1.end());
    std::set<int> set2(sortedvec2.begin(), sortedvec2.end());

    std::tr1::unordered_set<int> uset1(sortedvec1.begin(), sortedvec1.end());
    std::tr1::unordered_set<int> uset2(sortedvec2.begin(), sortedvec2.end());

    DWORD start, stop;
    DWORD delta[4];

    start = GetTickCount();
    for (int i = 0; i < 1000; ++i){
     stl_intersect(set1, set2, out.begin());
    }
    stop = GetTickCount();
    delta[0] = stop - start;

    start = GetTickCount();
    for (int i = 0; i < 1000; ++i){
     stl_intersect(uset1, uset2, out.begin());
    }
    stop = GetTickCount();
    delta[1] = stop - start;

    start = GetTickCount();
    for (int i = 0; i < 1000; ++i){
     stl_intersect(sortedvec1, sortedvec2, out.begin());
    }
    stop = GetTickCount();
    delta[2] = stop - start;

    start = GetTickCount();
    for (int i = 0; i < 1000; ++i){
     sort_stl_intersect(vecs1[i], vecs1[i], out.begin());
    }
    stop = GetTickCount();
    delta[3] = stop - start;

    std::cout << "std::set_intersection on std::set: " << delta[0] << "ms\n";
    std::cout << "std::set_intersection on tr1::unordered_set: " << delta[1] << "ms\n";
    std::cout << "std::set_intersection on sorted vectors: " << delta[2] << "ms\n";
    std::cout << "std::set_intersection on unsorted vectors: " << delta[3] << "ms\n";


    return 0;
}

There's no reason why C++ should always be faster than C#. C# has a few key advantages that require a lot of care to compete with in C++. The primary one I can think of is that dynamic allocations are ridiculously cheap in .NET-land. Every time a C++ vector, set or unordered_set (or any other container) has to resize or expand, it is a very costly malloc operation. In .NET, a heap allocation is little more than adding an offset to a pointer.

So if you want the C++ version to compete, you'll probably have to solve that, allowing your containers to resize without having to perform actual heap allocations, probably by using custom allocators for the containers (perhaps boost::pool might be a good bet, or you can try rolling your own)

Another issue is that set_difference only works on sorted input, and in order to reproduce tests results that involve a sort, we have to make a fresh copy of the unsorted data in each iteration, which is costly (although again, using custom allocators will help a lot). I don't know what form your input takes, but it is possible that you can sort your input directly, without copying it, and then run set_difference directly on that. (That would be easy to do if your input is an array or a STL container at least.)

One of the key advantages of the STL is that it is so flexible, it can work on pretty much any input sequence. In C#, you pretty much have to copy the input to a List or Dictionary or something, but in C++, you might be able to get away with running std::sort and set_intersection on the raw input.

Finally, of course, try running the code through a profiler and see exactly where the time is being spent. You might also want to try running the code through GCC instead. It's my impression that STL performance in MSVC is sometimes a bit quirky. It might be worth testing under another compiler just to see if you get similar timings there.

Finally, you might find these blog posts relevant for performance of C++ vs C#: http://blogs.msdn.com/ricom/archive/2005/05/10/416151.aspx

The morale of those is essentially that yes, you can get better performance in C++, but it is a surprising amount of work.

jalf 2009-06-29 21:55:25

Agreed, the test is not strictly set intersection, it also includes populating any data structures needed for the test. I've updated the test to run 1000 times.

Alex Black 2009-06-29 22:11:50

Populating the data structures is not part of *this* test though. Measure that separately. You are introducing a huge amount of uncertainty which essentially invalidates your results.

jalf 2009-06-29 22:13:43

I modified the code and re-ran the benchmarks with the population of the data structures done before the timed tests.

Alex Black 2009-06-29 22:24:14

Jalf: it looks to me like std::set is a sorted data structure. http://www.sgi.com/tech/stl/set.html

Alex Black 2009-06-29 22:28:05

oh right, I thought you were running that one on vectors too. (Why aren't you?)

jalf 2009-06-29 22:28:58

because the first example I found of how to use set_intersection showed it working on sets. If you think switching to vectors will speed things up I will give it a try.

Alex Black 2009-06-29 22:30:19

I don't know if it'll speed it up, but set_intersection works on any input iterator, as long as it is a sorted sequence (so if you use vectors, you'll have to first sort it)

jalf 2009-06-29 22:33:09

@Jalf: I modified the code to pass sorted vectors into set_intersection. If I sorted them before the test, then the intersection was done in 352ms (1000 times!) so very fast. But if I do the sort inside the test, then it takes 2401ms.

Alex Black 2009-06-29 22:48:28

2401 is still twice as good as your C# code though, isn't it?

jalf 2009-06-29 22:51:28

yes, 2401ms twice as fast as C# and very good.. I like it. I'm just wondering if its a valid test given the data in set1 is already sorted (my real data is not). I'm going to look at updating the test to work on two random (not sorted) tests.

Alex Black 2009-06-29 22:53:11

I've updated my answer with a few other suggestions btw

jalf 2009-06-29 22:53:37

Added my own benchmark + results as well

jalf 2009-06-29 23:07:25

hey Jalf, thats awesome, very fast.. However, would it be easy to adjust so that the vectors are re-sorted each time? My data isn't sorted, so I am going to have to sort it. In real life this won't be run 1000 times on pre-sorted vectors.

Alex Black 2009-06-29 23:12:05

Done. I test with both pre-sorted and unsorted vectors now. It looks like both set and unordered_set are a lot faster than in your own tests though. Can you reproduce the same results if you run my code?

jalf 2009-06-29 23:20:07

Nice. One more question: set1 is already sorted... I'm trying to find a good way to randomize it while still ensuring it has values from 0 to 100,000 (excluding the offset).

Alex Black 2009-06-29 23:22:08

looks like std::random_shuffle can do that :)

jalf 2009-06-29 23:23:32

nice. will try that.

Alex Black 2009-06-29 23:25:54

I posted an updated benchmark and code below, I'm not seeing the crazy fast results you are Jalf. (it was fast, until I shuffled set1 and sorted it each time)

Alex Black 2009-06-29 23:34:24

You're now getting the same times I'm getting, about 10s to use set_intersection on unsorted lists. This is a 2x improvement on where we started, but still 2x slower than C#.

Alex Black 2009-06-29 23:47:38

Does that one matter though? If set or unsorted_set is still 4x faster than C#

jalf 2009-06-29 23:59:04

Set: the sorting is done outside the benchmark... so doesn't seem fair. Unordered_set: does this work? You're passing what looks like an unsorted set to set_intersection...

Alex Black 2009-06-30 00:03:58

Set is an ordered data structure, remember? So whether the input it is built from is sorted or not doesn't matter. It sorts its input on insertion anyway.About unordered_set, I think you're right, that isn't valid, since it obviously isn't sorted. Hadn't really thought of that. :)

jalf 2009-06-30 00:09:23

Set: yes, so essentially it 'sorts' as you insert.. Insertions are slower into sets than into vectors. So if I started with my unsorted data, it will take me longer to get it into a set than a vector.

Alex Black 2009-06-30 00:12:44

I adjusted the C# code now, to shuffle set1, it didn't make any diffence. The C# method doesn't care if the sets are sorted or not.

Alex Black 2009-06-30 00:21:38

yeah, I realized the same thing. I implemented the same algorithm again in C++, and so far, getting around 600ms on it, but I think I have a bug or two, so it'll be a few mins before I post it.

jalf 2009-06-30 00:39:26

cool - one thing to do is to output the count of items in the intersection, as a sanity check (it should be ~500)

Alex Black 2009-06-30 00:48:19

yep, it was a bug, was testing intersect(set2, set2) :)Well, it's bedtime for me. Going to add a bit to my answer, and then head off. :)

jalf 2009-06-30 00:52:43

Ok, posted my final update for tonight. No new code or benchmarks, but a few hints on what you should probably try next. I might give it a shot again tomorrow. :)

jalf 2009-06-30 01:11:52

Answer 4

A:

Are C++ optimization flags turned on?

Magnus 2009-06-29 22:31:33

which ones? Optimization is set to "Maximize Speed (/O2)"

Alex Black 2009-06-29 22:32:54

I set "Favor Size or Speed" to "Favor Fast Code (/Ot)", no difference.

Alex Black 2009-06-29 22:34:33

Did you figure out what was taking most of the time?

Magnus 2009-06-29 22:50:10

no I haven't figured out what takes most of the time. In the set_intersection test there isn't much going on except calling set_intersection :)

Alex Black 2009-06-29 22:51:00

Answer 5

A:

Ok, after much feedback I've updated the original question a number of times:

The tests are now each run 1,000 times
The C# code now uses a higher resolution timer
The data structures are now populated BEFORE the tests

The result of this so far is that C# is still ~5x faster than C++.

Thanks everyone for your ideas/suggestions.

Alex Black 2009-06-29 22:32:04

Answer 6

+6 A:

One problem I see right away is that you're passing the sets in C++ by value and not by const reference. So you're copying them every time you pass them around!

Also, I would not use a set for the target of set_intersection. I would use something like

int runSetIntersection(const set<int>& set1, const set<int>& set2)
{   
    vector<int> intersection;
    intersection.reserve(10000) // or whatever the max is

    set_intersection(set1.begin(),set1.end(), set2.begin(), set2.end(), back_inserter(intersection));

    return intersection.size(); 
}

This code, however, still allocates inside the function. Even faster would be

int runSetIntersection(const set<int>& set1, const set<int>& set2, vector<int>& scratch)
{   
    scratch.reserve(10000) // or whatever the max is

    set_intersection(set1.begin(),set1.end(), set2.begin(), set2.end(), back_inserter(scratch));

    return scratch.size(); 
}

And then allocate scratch before you start the timer.

Though, if you're just looking for the size, a hand-written for loop, combined with set::find might give even better results.

rlbond 2009-06-29 22:49:00

Good spotting! That ought to count for some of the slowness.

Andreas Magnusson 2009-06-29 22:52:15

Answer 7

A:

Update:

I modified the set_intersection code to use vectors, and to sort them (instead of using the sorted set class), and its MUCH faster now:

Found the intersection of 319 values (using unordered_map) 1000 times, in 22187.5ms
Found the intersection of 315 values (using set_intersection) 1000 times, in 2401.62ms

Keep in mind: the larger set is created sorted, so sorting it might not take much time in this example.

C++ Code:

// MapPerformance.cpp : Defines the entry point for the console application.
//

#include "stdafx.h"
#include <hash_map>
#include <vector>
#include <iostream>
#include <time.h>
#include <algorithm>
#include <set>

#include <boost\unordered\unordered_map.hpp>

#include "timer.h"

using namespace std;
using namespace stdext;
using namespace boost;

int runIntersectionTest(vector<int> set1, vector<int> set2)
{
    // hash_map<int,int> theMap;
    // map<int,int> theMap;
    unordered_map<int,int> theMap;

    // Now intersect the two sets by populating the map
    for ( vector<int>::iterator iterator = set1.begin(); iterator != set1.end(); iterator++ )
    {
     int value = *iterator;

     theMap[value] = 1;
    }

    int intersectionSize = 0;

    for ( vector<int>::iterator iterator = set2.begin(); iterator != set2.end(); iterator++ )
    {
     int value = *iterator;

     unordered_map<int,int>::iterator foundValue = theMap.find(value);

     if ( foundValue != theMap.end() )
     {
      theMap[value] = 2;

      intersectionSize++;
     }
    }

    return intersectionSize;

}

int runSetIntersection(vector<int> set1, vector<int> set2)
{   
    sort(set1.begin(),set1.end());
    sort(set2.begin(),set2.end());

    set<int> intersection;

    set_intersection(set1.begin(),set1.end(), set2.begin(), set2.end(), inserter(intersection, intersection.end()));

    return intersection.size(); 
}



int _tmain(int argc, _TCHAR* argv[])
{
    srand ( time(NULL) );

    vector<int> set1;
    vector<int> set2;

    set1.reserve(10000);
    set2.reserve(1000);

    // Create 100,000 values for set1
    for ( int i = 0; i < 100000; i++ )
    {
     int value = 1000000000 + i;
     set1.push_back(value);
    }

    // Create 1,000 values for set2
    for ( int i = 0; i < 1000; i++ )
    {
     int random = rand() % 200000 + 1;
     random *= 10;

     int value = 1000000000 + random;
     set2.push_back(value);
    }

    int intersectionSize = 0;


    Timer timer;
    for ( int i = 0; i < 1000; i++ )
    {
     intersectionSize = runIntersectionTest(set1, set2);
    }
    timer.Stop();

    cout << "Found the intersection of " << intersectionSize << " values (using unordered_map) 1000 times, in " << timer.GetMilliseconds() << "ms" << endl;

    timer.Reset();
    for ( int i = 0; i < 1000; i++ )
    {
     intersectionSize = runSetIntersection(set1,set2);
    }
    timer.Stop();

    cout << "Found the intersection of " << intersectionSize << " values (using set_intersection) 1000 times, in " << timer.GetMilliseconds() << "ms" << endl;

    getchar();

    return 0;
}

Alex Black 2009-06-29 22:50:32

Andreas Magnusson 2009-06-29 22:53:09

good call, sloppy mistake on my part trying to rapidly change the code.

Alex Black 2009-06-29 22:57:59

You could write the intersection results to a vector instead of a set (should be a lot faster). Also I'd expect unordered_set to be a lot faster than plain sets

jalf 2009-06-29 23:42:32

Answer 8

+2 A:

I would change the C++ "runIntersectionTest" to take const references to the containers rather than having them copy-constructed on each call. (The C# code will be using refs.)

2009-06-29 22:55:20

good call, I'll do that.

Alex Black 2009-06-29 22:56:57

Answer 9

+1 A:

Since you're using Visual Studio you should check whether you have _SECURE_SCL set to 1 (typically if you haven't explicitly set it it will be 1). If it's set all STL-code will be range-checked, even in release-builds. Typically slowing down code by a 10-15%.

It seems Microsoft wasn't aware that for instance std::vector already has an interface if you want the range-checking: std::vector::at()!

(Sorry, had to get it off my chest).

Anyway the main inefficiency is that you're copying the containers instead of passing them by value. Use references to (try to) compare apples and apples instead of apples and bananas.

Andreas Magnusson 2009-06-29 22:57:58

I'll adjust and repost without copying... I have set _SECURE_SCL to 0. (#define _SECURE_SCL 0)

Alex Black 2009-06-29 23:00:50

Answer 10

+2 A:

It may also be worthwhile looking at the boost Disjoint Set container, which is specially optimized for certain kinds of large set operations.

It works by treating a group of sets as the unions of several disjoint sets, making it possible to build other sets, such as intersections or unions very cheaply, once the initial set of disjoint sets is constructed. If you expect to be doing a lot of set operations on sets that don't change much, you can probably expect this to be very fast. If, on the other hand, you will use each set once and throw it away, it's probably not going to do too much.

Anyway, you'd be doing yourself a favor to at least experiment with this to see if it gives you any bump in your specific case.

TokenMacGuy 2009-06-29 23:11:45

This structure has just given me a chill, for it contains the first practical use of the Ackermann function, or rather its inverse. Amazing!

TokenMacGuy 2009-06-29 23:14:34

Answer 11

A:

Ok, here is the latest, with some changes:

The C++ sets are now properly setup so they have a 50% intersection (like the C#)
Set1 is shuffled so its not sorted, set2 was already not sorted
The set_intersection implementation now uses vectors, and sorts them first

C++ (Release, x64) Results:

Found the intersection of 503 values (using unordered_map) 1000 times, in 35131.1ms
Found the intersection of 494 values (using set_intersection) 1000 times, in 10317ms

So its 2x slower than C#. @Jalf: You're getting some pretty fast numbers, is there something I'm doing wrong here?

C++ Code:

// MapPerformance.cpp : Defines the entry point for the console application.
//

#include "stdafx.h"
#include <hash_map>
#include <vector>
#include <iostream>
#include <time.h>
#include <algorithm>
#include <set>

#include <boost\unordered\unordered_map.hpp>

#include "timer.h"

using namespace std;
using namespace stdext;
using namespace boost;

int runIntersectionTest(const vector<int>& set1, const vector<int>& set2)
{
    // hash_map<int,int> theMap;
    // map<int,int> theMap;
    unordered_map<int,int> theMap; 

    vector<int>::const_iterator set1_end = set1.end();

    // Now intersect the two sets by populating the map
    for ( vector<int>::const_iterator iterator = set1.begin(); iterator != set1_end; ++iterator )
    {
     int value = *iterator;

     theMap[value] = 1;
    }

    int intersectionSize = 0;

    vector<int>::const_iterator set2_end = set2.end();

    for ( vector<int>::const_iterator iterator = set2.begin(); iterator != set2_end; ++iterator )
    {
     int value = *iterator;

     unordered_map<int,int>::iterator foundValue = theMap.find(value);

     if ( foundValue != theMap.end() )
     {
      theMap[value] = 2;

      intersectionSize++;
     }
    }

    return intersectionSize;

}

int runSetIntersection(const vector<int> set1_unsorted, const vector<int> set2_unsorted)
{   
    // Create two vectors
    std::vector<int> set1(set1_unsorted.size());
    std::vector<int> set2(set2_unsorted.size());

    // Copy the unsorted data into them
    std::copy(set1_unsorted.begin(), set1_unsorted.end(), set1.begin());
    std::copy(set2_unsorted.begin(), set2_unsorted.end(), set2.begin());

    // Sort the data
    sort(set1.begin(),set1.end());
    sort(set2.begin(),set2.end());

    vector<int> intersection;
    intersection.reserve(1000);

    set_intersection(set1.begin(),set1.end(), set2.begin(), set2.end(), inserter(intersection, intersection.end()));

    return intersection.size(); 
}

void createSets( vector<int>& set1, vector<int>& set2 )
{
    srand ( time(NULL) );

    set1.reserve(100000);
    set2.reserve(1000);

    // Create 100,000 values for set1
    for ( int i = 0; i < 100000; i++ )
    {
     int value = 1000000000 + i;
     set1.push_back(value);
    }

    // Try to get half of our values intersecting
    float ratio = 200000.0f / RAND_MAX;


    // Create 1,000 values for set2
    for ( int i = 0; i < 1000; i++ )
    {
     int random = rand() * ratio + 1;

     int value = 1000000000 + random;
     set2.push_back(value);
    }

    // Make sure set1 is in random order (not sorted)
    random_shuffle(set1.begin(),set1.end());
}

int _tmain(int argc, _TCHAR* argv[])
{
    int intersectionSize = 0;

    vector<int> set1, set2; 
    createSets( set1, set2 );

    Timer timer;
    for ( int i = 0; i < 1000; i++ )
    {
     intersectionSize = runIntersectionTest(set1, set2);
    }
    timer.Stop();

    cout << "Found the intersection of " << intersectionSize << " values (using unordered_map) 1000 times, in " << timer.GetMilliseconds() << "ms" << endl;

    timer.Reset();
    for ( int i = 0; i < 1000; i++ )
    {
     intersectionSize = runSetIntersection(set1,set2);
    }
    timer.Stop();

    cout << "Found the intersection of " << intersectionSize << " values (using set_intersection) 1000 times, in " << timer.GetMilliseconds() << "ms" << endl;

    getchar();

    return 0;
}

Alex Black 2009-06-29 23:33:16

I had a bug in my initialization code for a while, which gave me far too fast results. Fixed that now. As far as I can see, the performance we've got now is basically: presorted vectors > unordered set > set > C# > unsorted vectors. Which means you have at least two valid options that perform better than C#.

jalf 2009-06-30 00:04:09

presorted vectors and set don't include the sort time in the benchmark... my data is not sorted. unordered_set: see my question on the other thread, are you sure this works? doesn't set_intersection require a sorted input?

Alex Black 2009-06-30 00:09:29

These updates should be edits on your original post.

GMan 2009-06-30 01:04:29

I put my latest updates in the original post.

Alex Black 2009-06-30 01:41:48

Answer 12

A:

You are STILL passing the vectors by value. Which would be ok if you weren't copying them as well.

inserter was not puting the values at the end of the vector where is it quick. It only did that on the first insert after that it inserted the value at the beginning of the array (where end used to point).

you where looking up the value twice in the hash map version, when you updated the value. Why is this value event being updated?

run this code and post your timings.

// MapPerformance.cpp : Defines the entry point for the console application.
//

#include "stdafx.h"
#include <hash_map>
#include <vector>
#include <iostream>
#include <time.h>
#include <algorithm>
#include <set>

#include <boost\unordered\unordered_set.hpp>

#include "timer.h"

using namespace std;
using namespace stdext;
using namespace boost;

int runIntersectionTest(const vector<int>& set1, const vector<int>& set2)
{
    // hash_map<int,int> theMap;
    // map<int,int> theMap;
    unordered_set<int> theSet;      

     theSet.insert( set1.begin(), set2.end() );

    int intersectionSize = 0;

    vector<int>::const_iterator set2_end = set2.end();

    for ( vector<int>::const_iterator iterator = set2.begin(); iterator != set2_end; ++iterator )
    {
        if ( theSet.find(*iterator) != theSet.end() )
        {
                intersectionSize++;
        }
    }

    return intersectionSize;
}

int runSetIntersection( vector<int> set1, vector<int> set2)
{   
    // Sort the data
    sort(set1.begin(),set1.end());
    sort(set2.begin(),set2.end());

    vector<int> intersection;
    intersection.reserve(1000);

    set_intersection(set1.begin(),set1.end(), set2.begin(), set2.end(), back_inserter(intersection));

    return intersection.size(); 
}

void createSets( vector<int>& set1, vector<int>& set2 )
{
    srand ( time(NULL) );

    set1.reserve(100000);
    set2.reserve(1000);

    // Create 100,000 values for set1
    for ( int i = 0; i < 100000; i++ )
    {
        int value = 1000000000 + i;
        set1.push_back(value);
    }

    // Try to get half of our values intersecting
    float ratio = 200000.0f / RAND_MAX;


    // Create 1,000 values for set2
    for ( int i = 0; i < 1000; i++ )
    {
        int random = rand() * ratio + 1;

        int value = 1000000000 + random;
        set2.push_back(value);
    }

    // Make sure set1 is in random order (not sorted)
    random_shuffle(set1.begin(),set1.end());
}

int _tmain(int argc, _TCHAR* argv[])
{
    int intersectionSize = 0;

    vector<int> set1, set2;     
    createSets( set1, set2 );

    Timer timer;
    for ( int i = 0; i < 1000; i++ )
    {
        intersectionSize = runIntersectionTest(set1, set2);
    }
    timer.Stop();

    cout << "Found the intersection of " << intersectionSize << " values (using unordered_map) 1000 times, in " << timer.GetMilliseconds() << "ms" << endl;

    timer.Reset();
    for ( int i = 0; i < 1000; i++ )
    {
        intersectionSize = runSetIntersection(set1,set2);
    }
    timer.Stop();

    cout << "Found the intersection of " << intersectionSize << " values (using set_intersection) 1000 times, in " << timer.GetMilliseconds() << "ms" << endl;

    getchar();

    return 0;
}

caspin 2009-06-30 00:35:36

I made those changes, no noticeable difference. Your new runIntersectionTest is similar in performance to the unordered_map one (about 2x slower than set_intersection)

Alex Black 2009-06-30 01:43:15

Answer 13

A:

Latest benchmark:

Found the intersection of 504 values (using unordered_map) 1000 times, in 28827.6ms
Found the intersection of 495 values (using set_intersection) 1000 times, in 9817.69ms
Found the intersection of 504 values (using unordered_set) 1000 times, in 24769.1ms

I think the 504 - 495 difference happens because there are a couple dupe values.

Code:

// MapPerformance.cpp : Defines the entry point for the console application.
//

#include "stdafx.h"
#include <hash_map>
#include <vector>
#include <iostream>
#include <time.h>
#include <algorithm>
#include <set>
#include <unordered_set>

#include <boost\unordered\unordered_map.hpp>

#include "timer.h"

using namespace std;
using namespace stdext;
using namespace boost;
using namespace tr1;


int runIntersectionTest2(const vector<int>& set1, const vector<int>& set2)
{
    // hash_map<int,int> theMap;
    // map<int,int> theMap;
    unordered_set<int> theSet;      

     theSet.insert( set1.begin(), set1.end() );

    int intersectionSize = 0;

    vector<int>::const_iterator set2_end = set2.end();

    for ( vector<int>::const_iterator iterator = set2.begin(); iterator != set2_end; ++iterator )
    {
        if ( theSet.find(*iterator) != theSet.end() )
        {
                intersectionSize++;
        }
    }

    return intersectionSize;
}

int runIntersectionTest(const vector<int>& set1, const vector<int>& set2)
{
    // hash_map<int,int> theMap;
    // map<int,int> theMap;
    unordered_map<int,int> theMap; 

    vector<int>::const_iterator set1_end = set1.end();

    // Now intersect the two sets by populating the map
    for ( vector<int>::const_iterator iterator = set1.begin(); iterator != set1_end; ++iterator )
    {
     int value = *iterator;

     theMap[value] = 1;
    }

    int intersectionSize = 0;

    vector<int>::const_iterator set2_end = set2.end();

    for ( vector<int>::const_iterator iterator = set2.begin(); iterator != set2_end; ++iterator )
    {
     int value = *iterator;

     unordered_map<int,int>::iterator foundValue = theMap.find(value);

     if ( foundValue != theMap.end() )
     {
      theMap[value] = 2;

      intersectionSize++;
     }
    }

    return intersectionSize;

}

int runSetIntersection(const vector<int>& set1_unsorted, const vector<int>& set2_unsorted)
{   
    // Create two vectors
    std::vector<int> set1(set1_unsorted.size());
    std::vector<int> set2(set2_unsorted.size());

    // Copy the unsorted data into them
    std::copy(set1_unsorted.begin(), set1_unsorted.end(), set1.begin());
    std::copy(set2_unsorted.begin(), set2_unsorted.end(), set2.begin());

    // Sort the data
    sort(set1.begin(),set1.end());
    sort(set2.begin(),set2.end());

    vector<int> intersection;
    intersection.reserve(1000);

    set_intersection(set1.begin(),set1.end(), set2.begin(), set2.end(), back_inserter(intersection));

    return intersection.size(); 
}

void createSets( vector<int>& set1, vector<int>& set2 )
{
    srand ( time(NULL) );

    set1.reserve(100000);
    set2.reserve(1000);

    // Create 100,000 values for set1
    for ( int i = 0; i < 100000; i++ )
    {
     int value = 1000000000 + i;
     set1.push_back(value);
    }

    // Try to get half of our values intersecting
    float ratio = 200000.0f / RAND_MAX;


    // Create 1,000 values for set2
    for ( int i = 0; i < 1000; i++ )
    {
     int random = rand() * ratio + 1;

     int value = 1000000000 + random;
     set2.push_back(value);
    }

    // Make sure set1 is in random order (not sorted)
    random_shuffle(set1.begin(),set1.end());
}

int _tmain(int argc, _TCHAR* argv[])
{
    int intersectionSize = 0;

    vector<int> set1, set2; 
    createSets( set1, set2 );

    Timer timer;
    for ( int i = 0; i < 1000; i++ )
    {
     intersectionSize = runIntersectionTest(set1, set2);
    }
    timer.Stop();

    cout << "Found the intersection of " << intersectionSize << " values (using unordered_map) 1000 times, in " << timer.GetMilliseconds() << "ms" << endl;

    timer.Reset();
    for ( int i = 0; i < 1000; i++ )
    {
     intersectionSize = runSetIntersection(set1,set2);
    }
    timer.Stop();

    cout << "Found the intersection of " << intersectionSize << " values (using set_intersection) 1000 times, in " << timer.GetMilliseconds() << "ms" << endl;

    timer.Reset();
    for ( int i = 0; i < 1000; i++ )
    {
     intersectionSize = runIntersectionTest2(set1,set2);
    }
    timer.Stop();

    cout << "Found the intersection of " << intersectionSize << " values (using unordered_set) 1000 times, in " << timer.GetMilliseconds() << "ms" << endl;

    getchar();

    return 0;
}

Alex Black 2009-06-30 00:55:31

The code I posted deliberately didn't pass the vectors by reference, so the compiler would implicitly copy them. If you'd like to do it your way, replace to code before the sort with this:vector<int> set1( unsorted_set1 );vector<int> set2( unsorted_set2 );this way the vectors are efficiently copied.The code as it stands fills each vector with zeros. Then it copies the desired value over the top. The above method allocates enough space then copies the values from the parameter with no extra assignments.This will speed things up, though I don't know by how much

caspin 2009-06-30 02:00:19

Alex Black 2009-06-30 02:14:58

Answer 14

A:

Corwin Joy 2010-08-26 03:38:27

ansaurus

tags:

views:

answers:

Fast intersection of sets: C++ vs C#

related questions