views:

121

answers:

3

Does anyone know if it's kosher to pass a boost::unordered_set as the first parameter to boost::split? Under libboost1.42-dev, this seems to cause problems. Here's a small example program that causes the problem, call it test-split.cc:

#include <boost/algorithm/string/classification.hpp>
#include <boost/algorithm/string/split.hpp>
#include <boost/unordered_set.hpp>
#include <string>

int main(int argc, char **argv) {
  boost::unordered_set<std::string> tags_set;
  boost::split(tags_set, "a^b^c^",
               boost::is_any_of(std::string(1, '^')));
  return 0;
}

Then, if I run the following commands:

g++ -o test-split test-split.cc; valgrind ./test-split

I get a bunch of complaints in valgrind like the one that follows (I also sometimes see coredumps without valgrind, though it seems to vary based on timing):

==16843== Invalid read of size 8
==16843==    at 0x4ED07D3: std::string::end() const (in /usr/lib/libstdc++.so.6.0.13)
==16843==    by 0x401EE2: unsigned long boost::hash_value<char, std::allocator<char> >(std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) (in /tmp/test-split)
...
==16843==    by 0x402248: boost::unordered_set<std::string, boost::hash<std::string>, std::equal_to<std::string>, std::allocator<std::string> >& boost::algorithm::split<boost::unordered_set<std::string, boost::hash<std::string>, std::equal_to<std::string>, std::allocator<std::string> >, char const [26], boost::algorithm::detail::is_any_ofF<char> >(boost::unordered_set<std::string, boost::hash<std::string>, std::equal_to<std::string>, std::allocator<std::string> >&, char const (&) [26], boost::algorithm::detail::is_any_ofF<char>, boost::algorithm::token_compress_mode_type) (in /tmp/test-split)
==16843==    by 0x40192A: main (in /tmp/test-split)
==16843==  Address 0x5936610 is 0 bytes inside a block of size 32 free'd
==16843==    at 0x4C23E0F: operator delete(void*) (vg_replace_malloc.c:387)
==16843==    by 0x4ED1EE8: std::basic_string<char, std::char_traits<char>, std::allocator<char> >::~basic_string() (in /usr/lib/libstdc++.so.6.0.13)
==16843==    by 0x404A8B: void boost::unordered_detail::hash_unique_table<boost::unordered_detail::set<boost::hash<std::string>, std::equal_to<std::string>, std::allocator<std::string> > >::insert_range_impl<boost::transform_iterator<boost::algorithm::detail::copy_iterator_rangeF<std::string, char const*>, boost::algorithm::split_iterator<char const*>, boost::use_default, boost::use_default> >(std::string const&, boost::transform_iterator<boost::algorithm::detail::copy_iterator_rangeF<std::string, char const*>, boost::algorithm::split_iterator<char const*>, boost::use_default, boost::use_default>, boost::transform_iterator<boost::algorithm::detail::copy_iterator_rangeF<std::string, char const*>, boost::algorithm::split_iterator<char const*>, boost::use_default, boost::use_default>) (in /tmp/test-split)
...
==16843==    by 0x402248: boost::unordered_set<std::string, boost::hash<std::string>, std::equal_to<std::string>, std::allocator<std::string> >& boost::algorithm::split<boost::unordered_set<std::string, boost::hash<std::string>, std::equal_to<std::string>, std::allocator<std::string> >, char const [26], boost::algorithm::detail::is_any_ofF<char> >(boost::unordered_set<std::string, boost::hash<std::string>, std::equal_to<std::string>, std::allocator<std::string> >&, char const (&) [26], boost::algorithm::detail::is_any_ofF<char>, boost::algorithm::token_compress_mode_type) (in /tmp/test-split)
==16843==    by 0x40192A: main (in /tmp/test-split)

This is a Debian Squeeze box; here's my relevant system info:

$ g++ --version
g++ (Debian 4.4.5-2) 4.4.5
Copyright (C) 2010 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

$ dpkg -l | grep boost
ii  libboost-iostreams1.42.0            1.42.0-4                     Boost.Iostreams Library
ii  libboost1.42-dev                    1.42.0-4                     Boost C++ Libraries development files
$ uname -a
Linux gcc44-buildvm 2.6.32-5-amd64 #1 SMP Fri Sep 17 21:50:19 UTC 2010 x86_64 GNU/Linux

However, the code seems to work fine if I downgrade libboost1.42-dev to libboost1.40-dev. So is this a bug in boost 1.42, or am I misusing boost::split by passing in a container that can't handle sequences? Thanks!

A: 

Apparently, the answer is no yes.

Using the following code, I get compile-time warnings and a runtime assert (Visual C++ v10) on the unordered_set while the vector works fine (apart from an empty string in the last element, due to the trailing '^').

boost::unordered_set<std::string> tags_set;
vector<string> SplitVec; // #2: Search for tokens
boost::split( SplitVec, "a^b^c^", boost::is_any_of("^") ); 
boost::split( tags_set, "a^b^c^", boost::is_any_of("^") );

Iterator compatibility between source (string) and the target container is the issue. I would post the warning error, but it's one of those "War and Peace" template warnings.

EDIT:

This looks like a bug in Boost unordered_set? When I use the following, it works as you would expect:

std::unordered_set<std::string> tags_set_std;
boost::split( tags_set_std, string("a^b^c^"), boost::is_any_of(string("^")) );
Steve Townsend
Thanks Steve. What version of boost are you using?
Jeremy Stribling
@Jeremy - 1.44.0
Steve Townsend
@Jeremy - see EDIT
Steve Townsend
A: 

I think the answer should be yes.

Reading the headers (split.hpp and iter_find.hpp) split takes a SequenceSequenceT& Result as its first argument, which it passes to iter_split which range-constructs it from two boost::transform_iterators:

SequenceSequenceT Tmp(itBegin, itEnd);
Result.swap(Tmp);
return Result;

So all it needs of this type is that it has a constructor that takes a pair of iterators which dereference to std::string (or, technically, to BOOST_STRING_TYPENAME). And has a .swap() member.. and has a SequenceSequenceT::iterator type whose type is std::string.

proof:

#include <boost/algorithm/string/classification.hpp>
#include <boost/algorithm/string/split.hpp>
#include <string>
#include <iterator>
#include <algorithm>
#include <iostream>
struct X
{
   typedef std::iterator<std::forward_iterator_tag,
           std::string, ptrdiff_t, std::string*, std::string&>
           iterator;
   X() {}
   template<typename Iter> X(Iter i1, Iter i2)
   {
       std::cout << "Constructed X: ";
       copy(i1, i2, std::ostream_iterator<std::string>(std::cout, " " ));
       std::cout << "\n";
   }
   void swap(X&) {}
};
int main()
{
  X x;
  boost::split(x, "a^b^c^", boost::is_any_of(std::string(1, '^')));
}

I think that unordered_set<std::string> should satisfy these requirements as well.

Cubbi
Thanks @Cubbi. So your conclusion is that this is a bug somewhere in Boost 1.42, and that the compiler warnings seen by @Steve on VisualC++/boost 1.44 are misleading?
Jeremy Stribling
@Jeremy Stribling: It's what I would expect, seeing how my test and gcc's unordered_set work where boost's doesn't, but they might have had a good reason. I'd wait for more answers here and test more before calling it a bug.
Cubbi
@Jeremy - see edit, I got this working with `std::unordered_set` instead of `boost::unordered_set`
Steve Townsend
+1  A: 

This was confirmed on the boost-users mailing list to be a bug in the boost::unordered_set implementation. There is a patch available on the mailing list, and a fix will be checked in soon, hopefully in time for boost 1.45.

Boost-users: patch

Boost-users: confirmation

Thanks everyone for looking into this!

Jeremy Stribling