I've been looking boost::tokenizer, and I've found that the documentation is very thin. Is it possible to make it tokenize a string such as "dolphin--monkey--baboon" and make every word a token, as well as every double dash a token? From the examples I've only seen single character delimiters being allowed. Is the library not advanced enough for more complicated delimiters?
A:
It looks like you will need to write your own TokenizerFunction to do what you want.
Adam Batkin
2009-08-09 20:56:44
I see. I was hoping that there'd be something pre-made, but I guess that I was hoping for too much.
Martin
2009-08-09 21:01:26
+1
A:
One option is to try boost::regex. Not sure of the performance compared to a custom tokenizer.
std::string s = "dolphin--monkey--baboon";
boost::regex re("[a-z|A-Z]+|--");
boost::sregex_token_iterator iter(s.begin(), s.end() , re, 0);
boost::sregex_token_iterator end_iter;
while(iter != end_iter)
{
std::cout << *iter << '\n';
++iter;
}
Nathan
2009-08-09 21:56:56
A:
using iter_split allows you to use multiple character tokens.
The code below would produce the following:
dolphin
mon-key
baboon
#include <iostream>
#include <boost/foreach.hpp>
#include <boost/algorithm/string.hpp>
#include <boost/algorithm/string/iter_find.hpp>
// code starts here
std::string s = "dolphin--mon-key--baboon";
std::list<std::string> stringList;
boost::iter_split(stringList, s, boost::first_finder("--"));
BOOST_FOREACH(std::string token, stringList)
{
std::cout << token << '\n'; ;
}
AnthonyC
2009-10-07 03:16:49