views:

377

answers:

1

A common piece of code I use for simple string splitting looks like this:

inline std::vector<std::string> split(const std::string &s, char delim) {
    std::vector<std::string> elems;
    std::stringstream ss(s);
    std::string item;
    while(std::getline(ss, item, delim)) {
        elems.push_back(item);
    }
    return elems;
}

Someone mentioned that this will silently "swallow" errors occurring in std::getline. And of course I agree that's the case. But it occurred to me, what could possibly go wrong here in practice that I would need to worry about. basically it all boils down to this:

inline std::vector<std::string> split(const std::string &s, char delim) {
    std::vector<std::string> elems;
    std::stringstream ss(s);
    std::string item;
    while(std::getline(ss, item, delim)) {
        elems.push_back(item);
    }

    if(/* what error can I catch here? */) {
        // *** How did we get here!? ***
    }

    return elems;
}

A stringstream is backed by a string, so we don't have to worry about any of the issues associated with reading from a file. There is no type conversion going on here since getline simply reads until it sees the line delimeter or EOF. So we can't get any of the errors that something like boost::lexical_cast has to worry about.

I simply can't think of something besides failing to allocate enough memory that could go wrong, but that'll just throw a std::bad_alloc well before the std::getline even takes place. What am I missing?

+2  A: 

I can't imagine what errors this person thinks might happen, and you should ask them to explain. Nothing can go wrong except allocation errors, as you mentioned, which are thrown and not swallowed.

The only thing I see that you're directly missing is that ss.fail() is guaranteed to be true after the while loop, because that's the condition being tested. (bool(stream) is equivalent to !stream.fail(), not stream.good().) As expected, ss.eof() will also be true, indicating the failure was due to EOF.

However, there might be some confusion over what is actually happening. Because getline uses delim-terminated fields rather than delim-separated fields, input data such as "a\nb\n" has two instead of three fields, and this might be surprising. For lines this makes complete sense (and is POSIX standard), but how many fields, with a delim of '-', would you expect to find in "a-b-" after splitting?


Incidentally, here's how I'd write split:

template<class OutIter>
OutIter split(std::string const& s, char delim, OutIter dest) {
  std::string::size_type begin = 0, end;
  while ((end = s.find(delim, begin)) != s.npos) {
    *dest++ = s.substr(begin, end - begin);
    begin = end + 1;
  }
  *dest++ = s.substr(begin);
  return dest;
}

This avoids all of the problems with iostreams in the first place, avoids extra copies (the stringstream's backing string; plus the temp returned by substr can even use a C++0x rvalue reference for move semantics if supported, as written), has the behavior I expect from split (different from yours), and works with any container.

deque<string> c;
split("a-b-", '-', back_inserter(c));
// c == {"a", "b", ""}
Roger Pate
good point about the use `s.fail()`, i suppose `s.bad()` would be a better choice? or perhaps `!s.eof()`? (it should end due to EOF, so if it isn't EOF, then it failed right?)
Evan Teran
Also, good point about terminated vs separated fields. I've never had an issue with that before, but I could see it being surprising. All the more reason to test the number of fields you got before extracting your data from the result.
Evan Teran
@Evan: First identify the condition you're trying to check. There's no need to check fail, bad, eof, or anything else on *ss* after the loop, but you might want to check *elems*, as you said.
Roger Pate
well that check is currently hypothetical at the moment ;). Someone said this swallows an error, the question is, how would i catch this error? It seems that the error check simply isn't necessary, I just need to check that I got the amount i expected.
Evan Teran
@Evan: What error is it supposed to be swallowing? I can't find one, and am inclined to say they're flat-out wrong. (Unless they mean about -terminated vs -separated, but that's not an error in getline, it's a misunderstanding.)
Roger Pate
It was in reference to this post I made: http://stackoverflow.com/questions/2552839/which-c-standard-library-wrapper-functions-do-you-use/2556192#2556192The first comment claims that it silently swallows and error, so far no specifics have come from the discussion though.
Evan Teran
sbi