views:

1037

answers:

1

Hi, I'm trying to adapt this answer

http://stackoverflow.com/questions/53849/how-do-i-tokenize-a-string-in-c#53921

to my current string problem which involves reading from a file till eof.

from this source file:

Fix grammatical or spelling errors

Clarify meaning without changing it

Correct minor mistakes

I want to create a vector with all the tokenized words. Example: vector<string> allTheText[0] should be "Fix"

I don't understad the purpose of istream_iterator<std::string> end; but I included cause it was on the original poster's answer.

So far, I've got this non-working code:

vector<string> allTheText;
          stringstream strstr;
          istream_iterator<std::string> end;
          istream_iterator<std::string> it(strstr);

          while (!streamOfText.eof()){
                getline (streamOfText, readTextLine);
                cout<<readTextLine<<endl;

                stringstream strstr(readTextLine);
                // how should I initialize the iterators it and end here?

                }

Edit:

I changed the code to

          vector<string> allTheText;
          stringstream strstr;
          istream_iterator<std::string> end;
          istream_iterator<std::string> it(strstr);

          while (getline(streamOfText, readTextLine)) {
               cout << readTextLine << endl;

        vector<string> vec((istream_iterator<string>(streamOfText)), istream_iterator<string>()); // generates RuntimeError


          }

And got a RuntimeError, why?

+8  A: 

Using a while (!….eof()) loop in C++ is broken because the loop will never be exited when the stream goes into an error state!

Rather, you should test the stream's state directly. Adapted to your code, this could look like this:

while (getline(streamOfText, readTextLine)) {
    cout << readTextLine << endl;
}

However, you already have a stream. Why put it into a string stream as well? Or do you need to do this line by line for any reason?

You can directly initialize your vector with the input iterators. No need to build a string stream, and no need to use the copy algorithm either because there's an appropriate constructor overload.

vector<string> vec((istream_iterator<string>(cin)), istream_iterator<string>());

Notice the extra parentheses around the first argument which are necessary to disambiguate this from a function declaration.

EDIT A small explanation what this code does:

C++ offers a unified way of specifying ranges. A range is just a collection of typed values, without going into details about how these values are stored. In C++, these ranges are denoted as half-open intervals [a, b[. That means that a range is delimited by two iterators (which are kind of like pointers but more general; pointers are a special kind of iterator). The first iterator, a, points to the first element of the range. The second, b, points behind the last element. Why behind? Because this allows to iterate over the elements very easily:

for (Iterator i = a; i != b; ++i)
    cout << *i;

Like pointers, iterators are dereferenced by applying * to them. This returns their value.

Container classes in C++ (e.g. vector, list) have a special constructor which allows easy copying of values from another range into the new container. Consequently, this constructor expects two iterators. For example, the following copies the C-style array into the vector:

int values[3] = { 1, 2, 3 };
vector<int> v(values, values + 3);

Here, values is synonymous with &values[0] which means that it points to the array's first element. values + 3, thanks to pointer arithmetic, is nearly equivalent to &values[3] (but this is invalid C++!) and points to the virtual element behind the array.

Now, my code above does the exact same as in this last example. The only difference is the type of iterator I use. Instead of using a plain pointer, I use a special iterator class that C++ provides. This iterator class wraps an input stream in such a way that ++ advances the input stream and * reads the next element from the stream. The kind of element is specified by the type argument (hence string in this case).

To make this work as a range, we need to specify a beginning and an end. Alas, we don't know the end of the input (this is logical, since the end of the stream may actually move over time as the user enters more input into a console!). Therefore, to create a virtual end iterator, we pass no argument to the constructor of istream_iterator. Conversely, to create a begin iterator, we pass an input stream. This then creates an iterator that points to the current position in the stream (here, cin).

My above code is functionally equivalent to the following:

istream_iterator<string> front(cin);
istream_iterator<string> back;

vector<string> vec;

for (istream_iterator<string> i = front; i != back; ++i)
    vec.push_back(*i);

and this, in turn, is equivalent to using the following loop:

string word;
while (cin >> word)
    vec.push_back(word);
Konrad Rudolph
Why is there a (istream_iterator<string>(cin), why cin? and will this vector get the whole content of thext file without overwriting?
omgzor
I mean, I'll read the text file from an ifstream, I think I should place the name of the ifstream where cin is placed, right?
omgzor
@dmindreader: cin was the example - an istream_iterator<string> can be constructed from any istream.@Konrad: You might clarify that the no-arg constructor for istream_iterator<string> is the way of saying 'nothing more to get from the stream' (I forget how Josuttis put it, I know it was better).
Harper Shelby
@dmindreader: yes. `cin` was just an example, mainly used because it's shorter than your variable name. ;-)
Konrad Rudolph
@Harper: I've posted a bit of explanation. Does this satisfy you? I've no doubt that Josutti has put this much better than me but then I can only advise everyone to read Josutti's books.
Konrad Rudolph
I tried this istream_iterator<string> front(streamOfText); istream_iterator<string> back(); vector<string> vec; for (istream_iterator<string> i = front; i != back; ++i) vec.push_back(*i); and got a compile error saying "no match for 'operator!=' in 'i!=back
omgzor
Sorry, the code is buggy. Damn C++. Remove the empty parentheses after the declaration of `back`!
Konrad Rudolph