views:

1013

answers:

2

I am playing around with the boost strings library and have just come across the awesome simplicity of the split method.

  string delimiters = ",";
  string str = "string, with, comma, delimited, tokens, \"and delimiters, inside a quote\"";
  // If we didn't care about delimiter characters within a quoted section we could us
  vector<string> tokens;  
  boost::split(tokens, str, boost::is_any_of(delimiters));
  // gives the wrong result: tokens = {"string", " with", " comma", " delimited", " tokens", "\"and delimiters", " inside a quote\""}

Which would be nice and concise... however it doesn't seem to work with quotes and instead I have to do something like the following

string delimiters = ",";
string str = "string, with, comma, delimited, tokens, \"and delimiters, inside a quote\"";
vector<string> tokens; 
escaped_list_separator<char> separator("\\",delimiters, "\"");
typedef tokenizer<escaped_list_separator<char> > Tokeniser;
Tokeniser t(str, separator);
for (Tokeniser::iterator it = t.begin(); it != t.end(); ++it)
    tokens.push_back(*it);
// gives the correct result: tokens = {"string", " with", " comma", " delimited", " tokens", "\"and delimiters, inside a quote\""}

My question is can split or another standard algorithm be used when you have quoted delimiters? Thanks to purpledog but I already have a non-deprecated way of achieving the desired outcome, I just think that it's quite cumbersome and unless I could replace it with a simpler more elegant solution I wouldn't use it in general without first wrapping it in yet another method.

EDIT: Updated code to show results and clarify question.

+1  A: 

I don't know about the boost::string library but using the boost regex_token_iterator you'll be able to express delimiters in terms of regular expression. So yes, you can use quoted delimiters, and far more complex things as well.

Note that this used to be done with regex_split which is now deprecated.

Here's an example taken from the boost doc:

#include <iostream>
#include <boost/regex.hpp>

using namespace std;

int main(int argc)
{
   string s;
   do{
   if(argc == 1)
   {
   cout << "Enter text to split (or \"quit\" to exit): ";
   getline(cin, s);
   if(s == "quit") break;
   }
   else
   s = "This is a string of tokens";

   boost::regex re("\\s+");
   boost::sregex_token_iterator i(s.begin(), s.end(), re, -1);
   boost::sregex_token_iterator j;

   unsigned count = 0;
   while(i != j)
   {
   cout << *i++ << endl;
   count++;
   }
   cout << "There were " << count << " tokens found." << endl;

   }while(argc == 1);
   return 0;
}

If the program is started with hello world as argument the output is:

hello
world
There were 2 tokens found.

Changing boost::regex re("\s+"); into boost::regex re("\",\""); would split quoted delimiters. starting the program with hello","world as argument would also result in:

hello
world
There were 2 tokens found.

But I suspect you want to deal with things like that: "hello", "world", in which case one solution is:

  1. split with coma only
  2. then remove the "" (possibly using boost/algorithm/string/trim.hpp or the regex library).

EDIT: added program output

The example you gave would be improved if you show the output also. Just to make it abundantly clear to anyone who finds this page what the code does.
A. Levy
+1  A: 

It doesn't seem that there is any simple way to do this using the boost::split method. The shortest piece of code I can find to do this is

vector<string> tokens; 
tokenizer<escaped_list_separator<char> > t(str, escaped_list_separator<char>("\\", ",", "\""));
BOOST_FOREACH(string s, escTokeniser)
    tokens.push_back(s);

which is only marginally more verbose than the original snippet

vector<string> tokens;  
boost::split(tokens, str, boost::is_any_of(","));
Jamie Cook