tags:

views:

148

answers:

4

Suppose I have the following two data structures:

std::vector<int> all_items;
std::set<int> bad_items;

The all_items vector contains all known items and the bad_items vector contains a list of bad items. These two data structures are populated entirely independent of one another.

What's the proper way to write a method that will return a std::vector<int> contain all elements of all_items not in bad_items?

Currently, I have a clunky solution that I think can be done more concisely. My understanding of STL function adapters is lacking. Hence the question. My current solution is:

struct is_item_bad {
    std::set<int> const* bad_items;
    bool operator() (int const i) const {
        return bad_items.count(i) > 0;
    }
};

std::vector<int> items() const {
    is_item_bad iib = { &bad_items; };
    std::vector<int> good_items(all_items.size());
    std::remove_copy_if(all_items.begin(),  all_items.end(), 
                        good_items.begin(), is_item_bad);
    return good_items; 
}

Assume all_items, bad_items, is_item_bad and items() are all a part of some containing class. Is there a way to write them items() getter such that:

  • It doesn't need temporary variables in the method?
  • It doesn't need the custom functor, struct is_item_bad?

I had hoped to just use the count method on std::set as a functor, but I haven't been able to divine the right way to express that w/ the remove_copy_if algorithm.

EDIT: Fixed the logic error in items(). The actual code didn't have the problem, it was a transcription error.

EDIT: I have accepted a solution that doesn't use std::set_difference since it is more general and will work even if the std::vector isn't sorted. I chose to use the C++0x lambda expression syntax in my code. My final items() method looks like this:

std::vector<int> items() const {
    std::vector<int> good_items;
    good_items.reserve(all_items.size());
    std::remove_copy_if(all_items.begin(), all_items.end(),
                        std::back_inserter(good_items),
                        [&bad_items] (int const i) {
                            return bad_items.count(i) == 1;
                        });
}

On a vector of about 8 million items the above method runs in 3.1s. I bench marked the std::set_difference approach and it ran in approximately 2.1s. Thanks to everyone who supplied great answers.

+1  A: 

std::remove_copy_if returns an iterator to the target collection. In this case, it would return good_items.end() (or something similar). good_items goes out of scope at the end of the method, so this would cause some memory errors. You should return good_items or pass in a new vector<int> by reference and then clear, resize, and populate it. This would get rid of the temporary variable.

I believe you have to define the custom functor because the method depends on the object bad_items which you couldn't specify without it getting hackey AFAIK.

Dave
Thanks Dave. This was an input error on my part. The actual code does return `good_items`. I've fixed it in the question.
lrm
+8  A: 

As jeffamaphone suggested, if you can sort any input vectors, you can use std::set_difference which is efficient and less code:

#include <algorithm>
#include <set>
#include <vector>

std::vector<int> 
get_good_items( std::vector<int> const & all_items,
                std::set<int> const & bad_items )
{
    std::vector<int> good_items;

    // Assumes all_items is sorted.
    std::set_difference( all_items.begin(),
                         all_items.end(),
                         bad_items.begin(),
                         bad_items.end(),
                         std::back_inserter( good_items ) );

    return good_items;
}
Jon-Eric
This will only work if the input vector is ordered.
David Rodríguez - dribeas
@dribeas: I'm pretty sure that's what he meant when he said: "...if you can sort any input vectors..."
Jerry Coffin
The vector is ordered. The data arrives to me in an ordered fashion and I insert it into the vector as such. I didn't use a set for this because I didn't want to add additional set comparator overhead for data that was already naturally ordered. Was that a mistake?
lrm
No, Lrm, that's fine, as long as the order in the vector is the same order they *would* be in if you'd used a set instead of the vector.
Rob Kennedy
+2  A: 

Since your function is going to return a vector, you will have to make a new vector (i.e. copy elements) in any case. In which case, std::remove_copy_if is fine, but you should use it correctly:

#include <iostream>
#include <vector>
#include <set>
#include <iterator>
#include <algorithm>
#include <functional>
std::vector<int> filter(const std::vector<int>& all, const std::set<int>& bad)
{
        std::vector<int> result;
        remove_copy_if(all.begin(), all.end(), back_inserter(result),
                  [&bad](int i){return bad.count(i)==1;});
        return result;
}

int main()
{
        std::vector<int> all_items = {4,5,2,3,4,8,7,56,4,2,2,2,3};
        std::set<int> bad_items = {2,8,4};
        std::vector<int> filtered_items = filter(all_items, bad_items);
        copy(filtered_items.begin(), filtered_items.end(), std::ostream_iterator<int>(std::cout, " "));
        std::cout << std::endl;
}

To do this in C++98, I guess you could use mem_fun_ref and bind1st to turn set::count into a functor in-line, but there are issues with that (which resulted in deprecation of bind1st in C++0x) which means depending on your compiler, you might end up using std::tr1::bind anyway:

remove_copy_if(all.begin(), all.end(), back_inserter(result),
     bind(&std::set<int>::count, bad, std::tr1::placeholders::_1)); // or std::placeholders in C++0x

and in any case, an explicit function object would be more readable, I think:

struct IsMemberOf {
        const std::set<int>& bad;
        IsMemberOf(const std::set<int>& b) : bad(b) {}
        bool operator()(int i) const { return bad.count(i)==1;}
};
std::vector<int> filter(const std::vector<int>& all, const std::set<int>& bad)
{
        std::vector<int> result;
        remove_copy_if(all.begin(), all.end(), back_inserter(result), IsMemberOf(bad));
        return result;
}
Cubbi
+1... obviously, since this is about the same I had written myself and more complete!
David Rodríguez - dribeas
lrm
@lrm: Yes, the first example uses C++0x lambda expressions. It builds a function object that captures "bad" by reference and exposes `bool operator()(int i) const` -- just like what you wrote originally, except using a reference rather than a pointer. It compiles with gcc 4.5+ (and, I imagine, MSVC10). The second bit uses C++TR1 generalized binders (available in std namespace of gcc 4.4+ in c++0x mode and also available as boost.bind for all C++98 compilers), and the third uses basic C++98.
Cubbi
+2  A: 
Craig W. Wright
Thanks for the answer Craig. I'm trying to use the appropriate STL algorithm as it is item #43 of Meyer's effective STL.
lrm
@lrm: "appropriate STL algorithm" ... there are too much STL algorithms/etc for normal people to be able to memorize even half of them. I normally would write same as Craig. That is also K.I.S.S. compliant. Yet as the input vector is sorted, the code above is O(log(n)) while it can be done in O(n) by simply iterating the structures in parallel.
Dummy00001