views:

241

answers:

7

I am trying to compare two vector objects, and return a single vector containing all the chars which appear in both vectors.

How would I go about this without writing some horribly complex manual method which compares every char in the first vector to every char in the second vector and using an if to add it to a third vector (which would be returned) if they match.

Maybe my lack of real experience with vectors is making me imagine this will be harder than it really is, but I suspect there is some simplier way which I have been unable to find through searching.

A: 

Maybe you should be using std::strings instead of vectors, if you have characters in them? Strings have plenty of functionality for searching etc.

Tronic
+7  A: 

I think you're looking for std::set_intersection. The source vectors have to be sorted though. If you don't care about the order of your output vector, you could always run it on sorted copies of your source vectors.

And BTW, the manual naive way isn't horribly complex. Given two source vectors s1 and s2, and a destination vector dest, you could write something that looks like this:

for (std::vector<char>::iterator i = s1.begin(); i != s1.end(); ++i)
{
    if (std::find(s2.begin(), s2.end(), *i) != s2.end())
    {
        dest.push_back(*i);
    }
}

You have a lot of options for the find step depending on your choice of data structure.

Kristo
Thank you. I was expecting it to be something like this.
Sam Phelps
`set_intersection` only works if both vectors are sorted.
Jon-Eric
@Jon-Eric: I believe Kristo already said that....
Billy ONeal
@BillyONeal Kristo's answer didn't include that part when I left the comment.
Jon-Eric
@Jon-Eric: Actually, it did. The first revision has that information. Check the edit history if you don't believe me.
Billy ONeal
Ok guys, no need to argue. I built up my answer with several edits in quick succession. Apparently, not all of them were saved as separate revisions by the SO engine.
Kristo
It is important to note that `binary_search` requires the range to be ordered, so that manual implementation will only work if `s2` is in fact ordered. If none of the ranges is ordered you can use `std::find` instead of `std::binary_search` which will be slower (`O( n^2 )` for the whole algorithm) but does not require ordering.
David Rodríguez - dribeas
@David, good catch. I think I was updating my answer as you wrote your comment. :-)
Kristo
+1  A: 
int temp[5000]; // declare this globally if you're going to be 
                // doing a lot of set_intersection calls   

int main() {

  char x[]={'a','b','c','d','e'};
  char y[]={'b','c','g'};
  vector<char> v1(x,x+sizeof x/sizeof x[0]);
  vector<char> v2(y,y+sizeof y/sizeof y[0]);
  sort(v1.begin(),v1.end());
  sort(v2.begin(),v2.end());  // the vectors *must* be sorted!!!!!!

  vector<char> inter=vector<char>(temp,set_intersection(v1.begin(),v1.end(),v2.begin(),v2.end(),temp)); // inter contains {'b','c'}
  int cnt=set_intersection(v1.begin(),v1.end(),v2.begin(),v2.end(),temp) - temp;    // cnt=2

  for(int i = 0; i < (int)inter.size(); ++i) {
    cout<<inter[i]<<" ";
  }
  cout<<endl;

  return 0;
}
dcp
Let me check I understand this, as I think this has helped me understand the stuff about set_intersection I have found since posting the question.inter contains b and c, which are the characters common to x and y right?
Sam Phelps
@Sam Phelps - Yes, that is right. And cnt contains the number of elements that are in the intersection (I just put that there in case you only needed to get the count of intersected elements for some reason).
dcp
It might be clearer to use insert iterators instead of allocating a fixed size array for your destination vector.
Kristo
+1  A: 

Use set_intersection. Here's a working sample:

#include <cstdlib>
#include <iostream>
#include <string>
#include <vector>
#include <algorithm>

using namespace std;

int main()
{
    vector<string> v1;
    v1.push_back("Mary");
    v1.push_back("had");
    v1.push_back("a");

    vector<string> v2;
    v2.push_back("a");
    v2.push_back("little");
    v2.push_back("lamb");

    sort(v1.begin(), v1.end());
    sort(v2.begin(), v2.end());

    vector<string> v3;
    set_intersection(v1.begin(), v1.end(), v2.begin(), v2.end(), back_inserter(v3));

    copy(v3.begin(), v3.end(), ostream_iterator<string>(cout, "\r\n"));
    return 0;
}
John Dibling
+3  A: 

If I had to do this on two unsorted vectors (without library help), I think I'd add all the elements of one to a hashtable then iterate through the second looking up each--should be more efficient than sorting both lists first as well.

Bill K
+1  A: 
andand
A: 

Since it turns out from your later question you only actually care about 26 characters:

std::bitset<26> in;
for (std::vector<char>::iterator it = first.begin(); it != first.end(); ++it) {
    in[*it - 'a'] = true;
}
for (std::vector<char>::iterator it = second.begin(); it != second.end(); ++it) {
    if (in[*it - 'a']) {
        result.push_back(*it);
        // this line is only needed if 'second' can contain duplicates
        in[*it - 'a'] = false;
    }
}

In fact a bitset<UCHAR_MAX> is small on almost all architectures. Just watch out for those DSPs with 32-bit chars, and be cautious adapting this technique to wchar_t.

With BOOST_FOREACH, the code even looks reasonable:

assert(UCHAR_MAX <= 512 && "What kind of crazy machine is this?");
std::bitset<UCHAR_MAX> in;

BOOST_FOREACH(unsigned char c, first) {
    in[c] = true;
}

BOOST_FOREACH(unsigned char c, second) {
    if (in[c]) {
        result.push_back(c);
        // this line is only needed if 'second' can contain duplicates
        in[c] = false;
    }
}
Steve Jessop