tags:

views:

93

answers:

4

I am curious as to how I would go about reading the input from a text file with no set structure (Such as notes or a small report) word by word. The text for example might be structured like this:

"06/05/1992
Today is a good day;
The worm has turned and the battle was won."

I was thinking maybe getting the line using getline, and then seeing if I can split it into words via whitespace from there. Then I thought using strtok might work! However I don't think that will work with the punctuation.

Another method I was thinking of was getting everything char by char and omitting the characters that were undesired. Yet that one seems unlikely.

So to sort the thing short: Is there an easy way to read an input from a file and split it into words?

+3  A: 

Yes. You're looking for std::istream::operator>> :) Note that it will remove consecutive whitespace but I doubt that's a problem here.

i.e.

std::ifstream file("filename");
std::vector<std::string> words;
std::string currentWord;
while(file >> currentWord)
    words.push_back(currentWord);
Billy ONeal
To be honest, this way of doing it didn't even cross my mind. +1
Potatoswatter
I like this one. I need to break out of my Vector saftety zone, so PotatoSwatter's solution gave me the best learning experience.Pretty hard to choose the solution when all these work just fine for my problem.
Trygle
A: 

You can use getline with a space character, getline(buffer,1000,' ');

Or perhaps you can use this function to split a string into several parts, with a certain delimiter:

string StrPart(string s, char sep, int i) {
  string out="";
  int n=0, c=0;
  for (c=0;c<(int)s.length();c++) {
    if (s[c]==sep) {
      n+=1;
    } else {
      if (n==i) out+=s[c];
    }
  }
  return out;
}

Notes: This function assumes that it you have declared using namespace std;.

s is the string to be split. sep is the delimiter i is the part to get (0 based).

Alexander Rafferty
Billy ONeal
For a very short function, those variable names will do.
Alexander Rafferty
-1 for the terrible formatting. Spaces aid readability, as do parseable variable names. (The broader a variable's scope, the "better" its name should be. Function-scope variables should be at least a word.)
dash-tom-bang
+3  A: 

Since it's easier to write than to find the duplicate question,

#include <iterator>

std::istream_iterator<std::string> word_iter( my_file_stream ), word_iter_end;

size_t wordcnt;
for ( ; word_iter != word_iter_end; ++ word_iter ) {
    std::cout << "word " << wordcnt << ": " << * word_iter << '\n';
}

The std::string argument to istream_iterator tells it to return a string when you do *word_iter. Every time the iterator is incremented, it grabs another word from its stream.

If you have multiple iterators on the same stream at the same time, you can choose between data types to extract. However, in that case it may be easier just to use >> directly. The advantage of an iterator is that it can plug into the generic functions in <algorithm>.

Potatoswatter
+1 for the `istream_iterator` solution -- though I do note this might not be the best for a beginner :)
Billy ONeal
@Billy: Eh, I dunno. I think iterators are more fundamental than containers, and many beginners gloss over them and don't learn until they already have a body of code that passes `vector` everywhere.
Potatoswatter
~~`std::copy()`
wilhelmtell
@wilhelmtell: How exactly would you replicate the above using `std::copy`? There seems to be a bit more complicated logic going on inside the `for` than just copying...
Billy ONeal
oops i should have read through. pardon the fastest gun in the west who can't aim.
wilhelmtell
Well, I guess I should read into them.
Trygle
@Trygle: No, I should have explained. So now I did.
Potatoswatter
A: 

You can use the scanner technique to grabb words, numbers dates etc... very simple and flexible. The scanner normally returns token (word, number, real, keywords etc..) to a Parser.

If you later intend to interpret the words, I would recommend this approach.

I can warmly recommend the book "Writing Compilers and Interpreters" by Ronald Mak (Wiley Computer Publishing)

Max Kielland