views:

506

answers:

3

I've been looking boost::tokenizer, and I've found that the documentation is very thin. Is it possible to make it tokenize a string such as "dolphin--monkey--baboon" and make every word a token, as well as every double dash a token? From the examples I've only seen single character delimiters being allowed. Is the library not advanced enough for more complicated delimiters?

A: 

It looks like you will need to write your own TokenizerFunction to do what you want.

Adam Batkin
I see. I was hoping that there'd be something pre-made, but I guess that I was hoping for too much.
Martin
+1  A: 

One option is to try boost::regex. Not sure of the performance compared to a custom tokenizer.

std::string s = "dolphin--monkey--baboon";

boost::regex re("[a-z|A-Z]+|--");
boost::sregex_token_iterator iter(s.begin(), s.end() , re, 0);
boost::sregex_token_iterator end_iter;

while(iter != end_iter)
{
    std::cout << *iter << '\n';
    ++iter;
}
Nathan
A: 

using iter_split allows you to use multiple character tokens. The code below would produce the following:

dolphin
mon-key
baboon

#include <iostream>
#include <boost/foreach.hpp>
#include <boost/algorithm/string.hpp>
#include <boost/algorithm/string/iter_find.hpp>

    // code starts here
    std::string s = "dolphin--mon-key--baboon";
    std::list<std::string> stringList;
    boost::iter_split(stringList, s, boost::first_finder("--"));

    BOOST_FOREACH(std::string token, stringList)
    {    
        std::cout << token << '\n';  ;
    }
AnthonyC