views:

26746

answers:

15

Java has a convenient split method:

String str = "The quick brown fox";
String[] results = str.split(" ");

Is there an easy way to do this in C++?

A: 

There is no direct way to do this. Refer this code project source code to find out how to build a class for this.

Niyaz
+6  A: 

Here is a sample tokenizer class that might do what you want

//Header file
class Tokenizer 
{
    public:
        static const std::string DELIMITERS;
        Tokenizer(const std::string& str);
        Tokenizer(const std::string& str, const std::string& delimiters);
        bool NextToken();
        bool NextToken(const std::string& delimiters);
        const std::string GetToken() const;
        void Reset();
    protected:
        size_t m_offset;
        const std::string m_string;
        std::string m_token;
        std::string m_delimiters;
}

//CPP file
const string Tokenizer::DELIMITERS(" \t\n\r");

Tokenizer::Tokenizer(const std::string& s) :
    m_string(s), 
    m_offset(0), 
    m_delimiters(DELIMITERS) {}

Tokenizer::Tokenizer(const std::string& s, const std::string& delimiters) :
    m_string(s), 
    m_offset(0), 
    m_delimiters(delimiters) {}

bool Tokenizer::NextToken() 
{
    return NextToken(m_delimiters);
}

bool Tokenizer::NextToken(const std::string& delimiters) 
{
    size_t i = m_string.find_first_not_of(delimiters, m_offset);
    if (string::npos == i) 
    {
        m_offset = m_string.length();
        return false;
    }

    size_t j = m_string.find_first_of(delimiters, i);
    if (string::npos == j) 
    {
        m_token = m_string.substr(i);
        m_offset = m_string.length();
        return true;
    }

    m_token = m_string.substr(i, j - i);
    m_offset = j;
    return true;
}

Example:

std::vector <std::string> v;
Tokenizer s("split this string", " ");
while (s.NextToken())
{
 v.push_back(s.GetToken());
}
vzczc
+29  A: 

Your simple case can easily be built using the string::find method. However, take a look at Boost.Tokenizer. It's great. Boost generally has some very cool string tools.

Konrad Rudolph
+3  A: 

If you're willing to use C, you can use the strtok function. You should pay attention to multi-threading issues when using it.

On Freund
Note that strtok modifes the string you're checking, so you can't use it on const char * strings without making a copy.
Graeme Perrow
The multithreading issue is that strtok uses a global variable to keep track of where it is, so if you have two threads that each use strtok, you'll get undefined behavior.
JohnMcG
A: 

Here's a real simple one:

#include <vector>
#include <string>
using namespace std;

vector<string> split(const char *str, char c = ' ')
{
    vector<string> result;

    while(1)
    {
     const char *begin = str;

     while(*str != c && *str)
      str++;

     result.push_back(string(begin, str));

     if(0 == *str++)
      break;
    }

    return result;
}
Adam Pierce
A: 

I thought that was what the << operator on string streams was for:

string word << sin;

EDIT: oops! that should have been:

string word; sin >> word;
Daren Thomas
My fault for giving a bad (too simple) example. A far as I know, that only works when your delimiter is whitespace.
Bill the Lizard
Now that I've gotten around to using it, the syntax is sin >> word;
Bill the Lizard
+17  A: 

You can use streams, iterators, and the copy algorithm to do this fairly directly.

#include <string>
#include <vector>
#include <iostream>
#include <istream>
#include <ostream>
#include <iterator>
#include <sstream>
#include <algorithm>

int main()
{
  std::string str = "The quick brown fox";

  // construct a stream from the string
  std::stringstream strstr(str);

  // use stream iterators to copy the stream to the vector as whitespace separated strings
  std::istream_iterator<std::string> it(strstr);
  std::istream_iterator<std::string> end;
  std::vector<std::string> results(it, end);

  // send the vector to stdout.
  std::ostream_iterator<std::string> oit(std::cout);
  std::copy(results.begin(), results.end(), oit);
}
KeithB
I find those std:: irritating to read.. why not use "using" ?
@pheze: sir, why don't you edit instead of complaining?
Vadi
@Vadi: because editing someone else's post is quite intrusive. @pheze: I prefer to let the `std` this way I know where my object comes from, that's merely a matter of style.
Matthieu M.
@KeithB I understand your reason and I think it's actually a good choice if it works for you, but from a pedagogical standpoint I actually agree with pheze. It's easier to read and understand a completely foreign example like this one with a "using namespace std" at the top because it requires less effort to interpret the following lines... especially in this case because everything is from the standard library. You can make it easy to read and obvious where the objects come from by a series of "using std::string;" etc. Especially since the function is so short.
cheshirekow
P.S. Thanks... I used this snippet :)
cheshirekow
+14  A: 

Use strtok. In my opinion, there isn't a need to build a class around tokenizing unless strtok doesn't provide you with what you need. It might not, but in 15+ years of writing various parsing code in C and C++, I've always used strtok. Here is an example

char myString[] = "The quick brown fox";
char *p = strtok(myString, " ");
while (p) {
    printf ("Token: %s\n", p);
    p = strtok(NULL, " ");
}

A few caveats (which might not suit your needs). The string is "destroyed" in the process, meaning that EOS characters are placed inline in the delimter spots. Correct usage might require you to make a non-const version of the string. You can also change the list of delimiters mid parse.

In my own opinion, the above code is far simpler and easier to use than writing a separate class for it. To me, this is one of those functions that the language provides and it does it well and cleanly. It's simply a "C based" solution. It's appropriate, it's easy, and you don't have to write a lot of extra code :-)

Mark
Not that I dislike C, however strtok is not thread-safe, and you need to be certain that the string you send it contains a null character to avoid a possible buffer overflow.
tloach
There is strtok_r, but this was a C++ question.
Amigable Clark Kant
+28  A: 

The boost tokenizer class can make this sort of thing quite simple:

#include <iostream>
#include <string>
#include <boost/foreach.hpp>
#include <boost/tokenizer.hpp>

using namespace std;
using namespace boost;

int main(int argc, char** argv)
{
   string text = "token, test   string";

   char_separator<char> sep(", ");
   tokenizer<char_separator<char>> tokens(text, sep);
   BOOST_FOREACH(string t, tokens)
   {
      cout << t << "." << endl;
   }
}
Ferruccio
Good stuff, I've recently utilized this. My Visual Studio compiler has an odd whinge until I use a whitespace to separate the two ">" characters before the tokens(text, sep) bit: (error C2947: expecting '>' to terminate template-argument-list, found '>>')
AndyUK
+9  A: 

Boost has a strong split function: boost::algorithm::split.

Raz
A: 

For simple stuff I just use the following:

unsigned TokenizeString(const std::string& i_source,
         const std::string& i_seperators,
         bool i_discard_empty_tokens,
         std::vector<std::string>& o_tokens)
{
    unsigned prev_pos = 0;
    unsigned pos = 0;
    unsigned number_of_tokens = 0;
    o_tokens.clear();
    pos = i_source.find_first_of(i_seperators, pos);
    while (pos != std::string::npos)
    {
     std::string token = i_source.substr(prev_pos, pos - prev_pos);
     if (!i_discard_empty_tokens || token != "")
     {
      o_tokens.push_back(i_source.substr(prev_pos, pos - prev_pos));
      number_of_tokens++;
     }

     pos++;
     prev_pos = pos;
     pos = i_source.find_first_of(i_seperators, pos);
    }

    if (prev_pos < i_source.length())
    {
     o_tokens.push_back(i_source.substr(prev_pos));
     number_of_tokens++;
    }

    return number_of_tokens;
}

Cowardly disclaimer: I write real-time data processing software where the data comes in through binary files, sockets, or some API call (I/O cards, camera's). I never use this function for something more complicated or time-critical than reading external configuration files on startup.

jilles de wit
+1  A: 

I was originally writing a response to Doug's question: C++ Strings Modifying and Extracting based on Separators (closed)

But since Martin York closed that question with a pointer over here... I'll just generalize my code.

No offense folks, but for such a simple problem, you are making things way too complicated. There are a lot of reasons to use BOOST. But for something this simple, it's like hitting a fly with a 20# sledge.

void
split( vector<string> & theStringVector,  /* Altered/returned value */
       const  string  & theString,
       const  string  & theDelimiter )
{
  UASSERT( theDelimiter.size(), >, 0 ); // My own ASSERT macro.

  size_t  start = 0, end = 0;

  while ( end != string::npos )
  {
    end = theString.find( theDelimiter, start );

      // If at end, use length=maxLength.  Else use length=end-start.
    theStringVector.push_back( theString.substr( start,
                   (end == string::npos) ? string::npos : end - start ) );

      // If at end, use start=maxSize.  Else use start=end+delimiter.
    start = (   ( end > (string::npos - theDelimiter.size()) )
              ?  string::npos  :  end + theDelimiter.size()    );
  }
}

E.g.: (For Doug's case.)

int
main()
{
  vector<string> v;

  split( v, "A:PEP:909:Inventory Item", ":" );

#define SHOW(I,X)   cout << "[" << (I) << "]\t " # X " = \"" << (X) << "\"" << endl

  for( unsigned int i = 0;  i < v.size();   i++ )
    SHOW( i, v[i] );
}

And yes, we could have split() return a new vector rather than passing one in. It's trivial to wrap & overload. But depending on what I'm doing, I often find it better to re-use pre-existing objects rather than always creating new ones. (Just as long as I don't forget to empty the vector in between!)

Reference: http://www.cplusplus.com/reference/string/string/

Mr.Ree
+1: simplicity is a beautiful thing :)
rubenvb
Thanks...........
Mr.Ree
+8  A: 

Another quick way is to use getline. Something like:

stringstream ss("bla bla");
string s;

while (getline(ss, s, ' ')) {
 cout << s << endl;
}

If you want, you can make a simple split() method returning a vector<string>, which is really useful.

+1  A: 

MFC/ATL has a very nice tokenizer. From MSDN:

CAtlString str( "%First Second#Third" );
CAtlString resToken;
int curPos= 0;

resToken= str.Tokenize("% #",curPos);
while (resToken != "")
{
   printf("Resulting token: %s\n", resToken);
   resToken= str.Tokenize("% #",curPos);
};

Output

Resulting Token: First
Resulting Token: Second
Resulting Token: Third
Jim In Texas
+3  A: 

I know you asked for a C++ solution, but you might consider this helpful:

Qt

#include <QString>

...

QString str = "The quick brown fox"; 
QStringList results = str.split(" "); 

The advantage over Boost in this example is that it's a direct one to one mapping to your post's code.

See more at Qt documentation

ShaChris23