ansaurus

Question

Answer 1

+1 A:

The STL does not have such a method available already.

However, you can either use C's strtok function by using the string.c_str() member, or you can write your own. Here is a code sample I found after a quick google search ("STL string split"):

void Tokenize(const string& str,
                      vector<string>& tokens,
                      const string& delimiters = " ")
{
    // Skip delimiters at beginning.
    string::size_type lastPos = str.find_first_not_of(delimiters, 0);
    // Find first "non-delimiter".
    string::size_type pos     = str.find_first_of(delimiters, lastPos);

    while (string::npos != pos || string::npos != lastPos)
    {
        // Found a token, add it to the vector.
        tokens.push_back(str.substr(lastPos, pos - lastPos));
        // Skip delimiters.  Note the "not_of"
        lastPos = str.find_first_not_of(delimiters, pos);
        // Find next "non-delimiter"
        pos = str.find_first_of(delimiters, lastPos);
    }
}

Taken from: http://oopweb.com/CPP/Documents/CPPHOWTO/Volume/C++Programming-HOWTO-7.html

If you have questions about the code sample, leave a comment and I will explain.

And just because it does not implement a typedef called iterator or overload the << operator does not mean it is bad code. I use the C functions quite frequently. For example, printf and scanf both are faster then cin and cout (significantly), the fopen syntax is a lot more friendly for binary types, and they also tend to produce smaller EXEs.

Don't get sold on this "Elegance over performance" deal.

nlaq 2008-10-25 09:08:17

I'm aware of the C string functions and I'm aware of the performance issues too (both of which I've noted in my question). However, for this specific question, I'm looking for an elegant C++ solution.

Ashwin 2008-10-25 09:16:15

... and you dont want to just build a OO wrapper over the C functions why?

nlaq 2008-10-25 09:42:20

@Nelson LaQuet: Let me guess: Because strtok is not reentrant?

paercebal 2008-10-25 09:52:14

Why not use the C++ features that are meant for this job?

graham.reeds 2008-10-25 11:54:30

@Nelson don't *ever* pass string.c_str() to strtok! strtok trashes the input string (inserts '\0' chars to replace each foudn delimiter) and c_str() returns a non-modifiable string.

Evan Teran 2008-10-25 18:19:31

char* ch = new char[str.size()]; strcpy(ch, str.c_str()); ... delete[] ch; // problem solved.

nlaq 2008-10-26 00:20:58

@Nelson: That array needs to be of size str.size() + 1 in your last comment. But I agree with your thesis that it's silly to avoid C functions for "aesthetic" reasons.

j_random_hacker 2009-08-24 09:08:01

Answer 2

+17 A:

string word;

istringstream iss(line, istringstream::in);

while( iss >> word )     
{

...

}

This is my favourite way to iterate through a string. You can do what you want per word.

gnomed 2008-10-25 09:16:30

Is it possible to declare `word` as a `char`?

abatishchev 2010-06-26 17:23:27

Sorry abatishchev, C++ is not my strong point. But I imagine it would not be difficult to add an inner loop to loop through every character in each word. But right now I believe the current loop depends on spaces for word separation. Unless you know that there is only a single character between every space, in which case you can just cast "word" to a char... sorry I cant be of more help, ive been meaning to brush up on my C++

gnomed 2010-06-30 22:18:00

if you declare word as a char it will iterate over every non-whitespace character. It's simple enough to try: `stringstream ss("Hello World, this is*@# char c; while(ss >> c) cout << c;`

Wayne Werner 2010-08-04 18:03:07

Answer 3

+1 A:

Using stringstream as you have works perfectly fine, and do exactly what you wanted. If you're just looking for different way of doing things though, you can use find/find_first_of and substring.

#include <iostream>
#include <string>

int main()
{
    std::string s("Somewhere down the road");

    std::string::size_type prev_pos = 0, pos = 0;
    while( (pos = s.find(' ', pos)) != std::string::npos )
    {
        std::string substring( s.substr(prev_pos, pos-prev_pos) );

        std::cout << substring << '\n';

        prev_pos = ++pos;
    }
    std::string substring( s.substr(prev_pos, pos-prev_pos) ); // Last word
    std::cout << substring << '\n';
}

KTC 2008-10-25 09:28:35

Answer 4

A:

For a ridiculously large and probably redundant version, try a lot of for loops.

string stringlist[10];
int count = 0;

for (int i = 0; i < sequence.length(); i++)
{
 if (sequence[i] == ' ')
 {
  stringlist[count] = sequence.substr(0, i);
  sequence.erase(0, i+1);
  i = 0;
  count++;
 }
 else if (i == sequence.length()-1) // Last word
 {
  stringlist[count] = sequence.substr(0, i+1);
 }
}

It isn't pretty, but by and large (Barring punctuation and a slew of other bugs) it works!

Peter C. 2008-10-25 09:34:36

I was tempted to +1 this answer for its simple, readable code (which I presume rubbed an elegantophile the wrong way, hence the -1), but then I saw that you allocated a fixed-size array of strings to hold the tokens. Come on, you *know* that's gonna break at the worst possible moment! :)

j_random_hacker 2009-08-24 09:14:34

Answer 5

+3 A:

Shadow2531 2008-10-25 10:01:56

Not a perfect answer to his question, but that's exactly what I was looking for. Thanks!

2009-07-01 17:17:01

Great answer, elegant code with precisely everything that's needed.

Ilya 2009-10-12 14:17:30

Answer 6

+7 A:

This is similar to this question.

#include <iostream>
#include <string>
#include <boost/foreach.hpp>
#include <boost/tokenizer.hpp>

using namespace std;
using namespace boost;

int main(int argc, char** argv)
{
   string text = "token  test\tstring";

   char_separator<char> sep(" \t");
   tokenizer<char_separator<char>> tokens(text, sep);
   BOOST_FOREACH(string t, tokens)
   {
      cout << t << "." << endl;
   }
}

Ferruccio 2008-10-25 10:58:25

Thanks for pointing that out. I didn't know this operation was called tokenizing, so it never occurred to me to search for that term :-)

Ashwin 2008-10-27 02:48:04

Answer 7

+50 A:

I use this to split string by a delim. The first puts the results in an already constructed vector, the second returns a new vector.

std::vector<std::string> &split(const std::string &s, char delim, std::vector<std::string> &elems) {
    std::stringstream ss(s);
    std::string item;
    while(std::getline(ss, item, delim)) {
     elems.push_back(item);
    }
    return elems;
}


std::vector<std::string> split(const std::string &s, char delim) {
    std::vector<std::string> elems;
    return split(s, delim, elems);
}

Evan Teran 2008-10-25 18:21:27

i really <3 that solution. one convenient and one fast-without-compromise :)

Johannes Schaub - litb 2009-03-02 00:30:34

Works brilliantly! Don't forget to import `string`, `sstring` and `vector`.

Paul Lammertsma 2009-12-07 13:00:38

<3 the snippet. thanks a lot.

huy 2010-01-23 21:59:35

This hits the sweet spot for me - standard libraries, short, and lets me specify my delimiters. Thanks!

tfinniga 2010-03-29 16:39:40

elegant solution, I always forget about this particular "getline", thou I do not believe it is aware of quotes and escape sequences.

boskom 2010-05-27 13:32:12

+1 Short and crisp

Favonius 2010-08-10 12:35:32

Answer 8

+60 A:

Since everybody is already using Boost:

#include <boost/algorithm/string.hpp>
std::vector<std::string> strs;
boost::split(strs, "string to split", boost::is_any_of("\t "));

I bet this is much faster than the stringstream solution. And since this is a generic template function it can be used to split other types of strings (wchar, etc. or UTF-8) using all kinds of delimiters.

See the documentation for details.

ididak 2008-10-25 20:28:20

This is a good solution too! :-)

Ashwin 2008-10-27 02:49:50

Speed is irrelevant here, as both of these cases are much slower than a strtok-like function.

Tom 2009-03-01 16:51:08

This is practical and quick enough if you know the line will contain just a few tokens, but if it contains many then you will burn a ton of memory (and time) growing the vector. So no, it's not faster than the stringstream solution -- at least not for large n, which is the only case where speed matters.

j_random_hacker 2009-08-24 09:02:43

And for those who don't already have boost... bcp copies over 1,000 files for this :)

romkyns 2010-06-09 20:12:22

Answer 9

+56 A:

FWIW, here's another way to extract tokens from an input string, relying only on Standard Library facilities. It's an example of the power and elegance behind the design of the STL.

#include <iostream>
#include <string>
#include <sstream>
#include <algorithm>
#include <iterator>

int main() {
    using namespace std;
    string sentence = "Something in the way she moves...";
    istringstream iss(sentence);
    copy(istream_iterator<string>(iss),
             istream_iterator<string>(),
             ostream_iterator<string>(cout, "\n"));
}

Instead of copying the extracted tokens to an output stream, one could insert them into a container, using the same generic copy algorithm.

vector<string> tokens;
copy(istream_iterator<string>(iss),
         istream_iterator<string>(),
         back_inserter<vector<string> >(tokens));

Best regards.

Zunino 2008-10-26 00:43:09

Your solution doesn't even need Boost. Very cool! :-)

Ashwin 2008-10-27 02:54:33

Is it possible to specify a delimiter for this? Like for instance splitting on commas?

l3dx 2009-08-06 11:49:46

@l3dx: it seems that the parameter "\n" is the delimiter. This code is very nice, but I would like to know better about it. Maybe somebody could explain each line of that snippet?

Jonathan 2009-12-11 17:30:37

@Jonathan: \n is not the delimiter in this case, it's the deliminer for outputting to cout.

huy 2010-02-03 12:37:03

So can you split on comma?

graham.reeds 2010-07-22 09:09:30

A really nice code, but what about the delimiter? I guess this only works with withespaces.

wok 2010-08-01 14:46:25

based on this: http://www.cplusplus.com/reference/algorithm/copy/ no. The whitespace behavior is a function of the `istream_iterator`. It would be more elegant to roll your own.

Wayne Werner 2010-08-04 17:59:24

It doesn't work for me for some reasons.. it got crash while running..

Michael Sync 2010-08-28 12:33:14

@graham.reeds, @l3dx: Please don't write another CSV parser which can't handle quoted fields: http://en.wikipedia.org/wiki/Comma-separated_values

Douglas 2010-09-01 09:30:55

I wasn't planning on it. Never knew CSV had and RFC for it!

graham.reeds 2010-09-02 11:21:24

Answer 10

+5 A:

For those with whom it does not sit well to sacrifice all efficiency for code size and see "efficient" as a type of elegance, the following should hit a sweet spot (and I think the template container class is an awesomely elegant addition.):

template < class ContainerT >
void tokenize(const std::string& str, ContainerT& tokens, const std::string& delimiters = " ", const bool trimEmpty = false)
{
   std::string::size_type pos, lastPos = 0;
   while(true)
   {
      pos = str.find_first_of(delimiters, lastPos);
      if(pos == std::string::npos)
      {
         pos = str.length();

         if(pos != lastPos || !trimEmpty)
            tokens.push_back(ContainerT::value_type(str.data()+lastPos, (ContainerT::value_type::size_type)pos-lastPos ));

         break;
      }
      else
      {
         if(pos != lastPos || !trimEmpty)
            tokens.push_back(ContainerT::value_type(str.data()+lastPos, (ContainerT::value_type::size_type)pos-lastPos ));
      }

      lastPos = pos + 1;
   }
};

I usually choose to use std::vector<std::string> types as my second parameter (ContainerT)... but list<> is way faster than vector<> for when direct access is not needed, and you can even create your own string class and use something like std::list<SubString> where SubString does not do any copies for incredible speed increases.

It's more than double as fast as the fastest tokenize on this page and almost 5 times faster than some others. Also with the perfect parameter types you can eliminate all string and list copies.

Additionally it does not do the (extremely inefficient) return of result, but rather it passes the tokens as a reference, thus also allowing you to build up tokens using multiple calls if you so wished.

Lastly it allows you to specify whether to trim empty tokens from the results via a last optional parameter.

All it needs is std::string... the rest are optional. It does not use streams or the boost library, but is flexible enough to be able to accept some of these foreign types naturally.

Marius.

Marius 2009-09-29 15:12:11

Answer 11

+2 A:

In case anyone is interested, the minimalist version which relies upon getline, is the fastest on my test machine. (Boost based solution not tested !)

Surprise, surprise

Lesson learned, don't reinvent the wheel !

DamnedYankee 2009-11-24 14:16:13

"don't reinvent the wheel !" - unless you're a wheel engineer. Also, never forget the "my wheel is better than yours" argument! ;-)

Johann Gerell 2010-09-01 09:31:41

Answer 12

+1 A:

Here's another way of doing it..

void split_string(string text,vector<string>& words)
{
  int i=0;
  char ch;
  string word;

  while(ch=text[i++])
  {
    if (isspace(ch))
    {
      if (!word.empty())
      {
        words.push_back(word);
      }
      word = "";
    }
    else
    {
      word += ch;
    }
  }
  if (!word.empty())
  {
    words.push_back(word);
  }
}

Usama S. 2010-01-08 03:21:16

Answer 13

+2 A:

Yet another flexible and fast way

template<typename Operator>
void tokenize(Operator& op, const char* input, const char* delimiters) {
  const char* s = input;
  const char* e = s;
  while (*e != 0) {
    e = s;
    while (*e != 0 && strchr(delimiters, *e) == 0) ++e;
    if (e - s > 0) {
      op(s, e - s);
    }
    s = e + 1;
  }
}

To use it with a vector of strings:

class Appender : public std::vector<std::string> {
public:
  void operator() (const char* s, unsigned length) { 
    this->push_back(std::string(s,length));
  }
};

Appender v;
tokenize(v, "A number of words to be tokenized", " \t");

That's it! And that's just one way to use the tokenizer, like how to just count words:

class WordCounter {
public:
  WordCounter() : noOfWords(0) {}
  void operator() (const char*, unsigned) {
    ++noOfWords;
  }
  unsigned noOfWords;
};

WordCounter wc;
tokenize(wc, "A number of words to be counted", " \t"); 
ASSERT( wc.noOfWords == 7 );

Limited by imagination ;)

Robert 2010-04-01 14:16:43

Answer 14

A:

There is a function named strtok.

#include<string>
using namespace std;

vector<string> split(char* str,const char* delim)
{
    char* token = strtok(str,delim);

    vector<string> result;

    while(token != NULL)
    {
        result.push_back(token);
        token = strtok(NULL,delim);
    }
    return result;
}

TheMachineCharmer 2010-06-14 12:17:08

`strtok` is from the C standard library, not C++. It is not safe to use in multithreaded programs. It modifies the input string.

Kevin Panko 2010-06-14 14:07:03

@Kevin Panko: Thanks! Would you please explain why is it not safe to use in multi-threaded programs?

TheMachineCharmer 2010-06-14 16:17:44

Because it stores the char pointer from the first call in a static variable, so that on the subsequent calls when NULL is passed, it remembers what pointer should be used. If a second thread calls `strtok` when another thread is still processing, this char pointer will be overwritten, and both threads will then have incorrect results. http://www.mkssoftware.com/docs/man3/strtok.3.asp

Kevin Panko 2010-06-14 17:27:39

Thanks @Kevin Panko!! for the eye opener :)

TheMachineCharmer 2010-06-15 07:18:39

as mentioned before strtok is unsafe and even in C strtok_r is recommended for use

systemsfault 2010-07-06 12:17:05

Answer 15

A:

See my answer here if you can use Qt.

ShaChris23 2010-08-04 17:48:54

Answer 16

A:

I use this simpleton because we got our String class "special" (i.e. not standard):

void splitString(const String &s, const String &delim, std::vector<String> &result) {
    const int l = delim.length();
    int f = 0;
    int i = s.indexOf(delim,f);
    while (i>=0) {
        String token( i-f > 0 ? s.substring(f,i-f) : "");
        result.push_back(token);
        f=i+l;
        i = s.indexOf(delim,f);
    }
    String token = s.substring(f);
    result.push_back(token);
}

Abe 2010-09-01 09:25:52

ansaurus

tags:

views:

answers:

How to split a string?

related questions