views:

400

answers:

4

I have data in the following format:

4:How do you do?
10:Happy birthday
1:Purple monkey dishwasher
200:The Ancestral Territorial Imperatives of the Trumpeter Swan

The number can be anywhere from 1 to 999, and the string is at most 255 characters long. I'm new to C++ and it seems a few sources recommend extracting formatted data with a stream's >> operator, but when I want to extract a string it stops at the first whitespace character. Is there a way to configure a stream to stop parsing a string only at a newline or end-of-file? I saw that there was a getline method to extract an entire line, but then I still have to split it up manually [with find_first_of], don't I?

Is there an easy way to parse data in this format using only STL?

+2  A: 
N 1.1
It looks like the `m` flag is not standardised, so I can't use it. But, again, won't this still only read to the first whitespace character instead of to the end of the line?
dreamlax
dreamlax
@dreamlax thanks for pointing out. Corrected.
N 1.1
+7  A: 

You can read the number before you use getline, which reads from a stream and stores into a string object. Something like this:

int num;
string str;

while(cin>>num){
    getline(cin,str);

}
codaddict
That looks clean; I assume it will be safe to replace `cin` with an `istream` that I am given?
dreamlax
If you are reading from a file, you can replace cin with a **valid** ifstream object.
codaddict
I'm just given a stream and I my code is expected to parse the data, manipulate it and write it to another stream. I don't create either stream. I assume my filter wouldn't be invoked if the `istream` or `ostream` was invalid, but at the same time I don't think it's any of my concern. Garbage in garbage out :) . . . or maybe garbage in segfault out.
dreamlax
I had an extra `char` variable and did `while (cin >> num >> dummy)` to get rid of the colon character.
dreamlax
+2  A: 

Just read the data line by line (whole line) using getline and parse it.
To parse use find_first_of()

Dmitriy
+1  A: 

You've already been told about std::getline, but they didn't mention one detail that you'll probably find useful: when you call getline, you can also pass a parameter telling it what character to treat as the end of input. To read your number, you can use:

std::string number;
std::string name;

std::getline(infile, number, ':');
std::getline(infile, name);   

This will put the data up to the ':' into number, discard the ':', and read the rest of the line into name.

If you want to use >> to read the data, you can do that too, but it's a bit more difficult, and delves into an area of the standard library that most people never touch. A stream has an associated locale that's used for things like formatting numbers and (importantly) determining what constitutes "white space". You can define your own locale to define the ":" as white space, and the space (" ") as not white space. Tell the stream to use that locale, and it'll let you read your data directly.

#include <locale>
#include <vector>

struct colonsep: std::ctype<char> {
    colonsep(): std::ctype<char>(get_table()) {}

    static std::ctype_base::mask const* get_table() {
        static std::vector<std::ctype_base::mask> 
            rc(std::ctype<char>::table_size,std::ctype_base::mask());

        rc[':'] = std::ctype_base::space;
        rc['\n'] = std::ctype_base::space;
        return &rc[0];
    }
};

Now to use it, we "imbue" the stream with a locale:

#include <fstream>
#include <iterator>
#include <algorithm>
#include <iostream>

typedef std::pair<int, std::string> data;

namespace std { 
    std::istream &operator>>(std::istream &is, data &d) { 
       return is >> d.first >> d.second;
    }
    std::ostream &operator<<(std::ostream &os, data const &d) { 
        return os << d.first << ":" << d.second;
    }
}

int main() {
    std::ifstream infile("testfile.txt");
    infile.imbue(std::locale(std::locale(), new colonsep));

    std::vector<data> d;

    std::copy(std::istream_iterator<data>(infile), 
              std::istream_iterator<data>(),
              std::back_inserter(d));

    // just for fun, sort the data to show we can manipulate it:
    std::sort(d.begin(), d.end());

    std::copy(d.begin(), d.end(), std::ostream_iterator<data>(std::cout, "\n"));
    return 0;
}

Now you know why that part of the library is so neglected. In theory, getting the standard library to do your work for you is great -- but in fact, most of the time it's easier to do this kind of job on your own instead.

Jerry Coffin