ansaurus

Question

C++ Reading in words from a text file, word by word or char by char

Answer 1

+2 A:

You can read the file word by word by using the >> operator. For example, see this link: http://www.daniweb.com/forums/thread30942.html.

I excerpted their example here:

ifstream in ( "somefile" );
vector<string> words;
string word

if ( !in )
  return;

while ( in>> word )
  words.push_back ( word );

Scott Stafford 2010-09-15 04:27:17

But that operator uses a different definition of what a word is from what was asked.

sbi 2010-09-15 05:13:25

Answer 2

A:

Personally I like to read in input with std::getline(std::istream&, std::string&) (in the <string> header, but you will of course also need to #include a stream header).

This function breaks on newline, which is whitespace by your problem's definition. But it's not the entire answer to your question. After reading in the line of text, you're going to need to use string operations or standard algorithms to break the string into words. Or you could loop over the string by hand.

The guts would be something like:

std::string buffer;
while (std::getline(std::cin, buffer) {
// break each line into words, according to problem spec
}

Max Lybbert 2010-09-15 07:52:41

This could be problematic if there are hyphenated words in the text.

Space_C0wb0y 2010-09-15 08:38:20

A hyphenated word like "back-scatter" doesn't matter, as the problem spec defines whether that is counted as one word or two. However, if I understand Space_C0wb0y correctly, words that are hyphenated to continue on the next line would require more logic than what I've shown. Since this program sounds a lot like homework, I doubt that will be valid input, but if it is, then there would be a need to handle such input.

Max Lybbert 2010-09-15 20:45:16

Answer 3

A:

I use

// str is a string that holds the line of data from ifs- the text file.
// str holds the words to be split, res the vector to store them in.
while( getline( ifs, str ) ) 
    split(str, res);


void split(const string& str, vector<string>& vec)
{
    typedef unsigned int uint;

    const string::size_type size(str.size());
    uint start(0);
    uint range(0);

 /* Explanation: 
  * Range - Length of the word to be extracted without spaces.
  * start - Start of next word. During initialization, starts at space 0.
  * 
  * Runs until it encounters a ' ', then splits the string with a substr() function,
  * as well as making sure that all characters are lower-case (without wasting time
  * to check if they already are, as I feel a char-by-char check for upper-case takes
  * just as much time as lowering them all anyway.                                       
 */
    for( uint i(0); i < size; ++i )
    {
        if( isspace(str[i]) )
        {
            vec.push_back( toLower(str.substr(start, range + 1)) );
            start = i + 1;
            range = 0;
        } else
            ++range;
    }
    vec.push_back( toLower(str.substr(start, range)) );
}

I'm not sure this is particularly helpful to you, but I'll try. The toLower function is a quick function that simply uses the ::toLower() function. This reads each char until a space, then stuffs it in an vector. I'm not entirely sure what you mean with char by char.

Do you want to extract a word character by a time? Or do you want to check each character as you go along? Or do you mean you want to extract one word, finish, and then come back? If so, I would 1) recommend a vector anyway, and 2) let me know so I can refactor the code.

SoulBeaver 2010-09-15 07:57:35

my original plan was to read in a word, char by char at a time, and when it hits whitespace or any punctuation it would stop getting the word, turn all those chars into a string and send that string to my other function for further processing. turning all uppercase into lower case. IE "Don't" would become "don't".

MOneAtt 2010-09-15 20:17:05

Answer 4

A:

What's going to terminate your inner loop if c == 'a'? ASCII value for 'a' is 97.

Andrew Bainbridge 2010-09-15 08:01:59

if c == a then its not gonna terminate the inner loop. the inner loop terminates if the char is not A-Z,a-z, and '

MOneAtt 2010-09-15 21:01:48

Answer 5

+1 A:

Your logic is wrong. The inner loop runs as long as c doesn't change, and there's nothing in it that would change c.

Why are you having two loops anyway? I think you might be confused about whether that function is supposed to read the next word or all words. Try to separate those concerns, put them into different functions (one of which is calling the other). I find it easiest to approach such problems in a top-down order:

while(inFile.good()) {
  std::string word = GetNextWord(inFile);
  if(!word.empty())
    std::cout << word << std::endl;
}

Now fill in the gaps by defining GetNextWord() to read everything up to the next word boundary.

sbi 2010-09-15 08:24:50

I'll give this a try and report back with my results, Thanks

MOneAtt 2010-09-15 20:24:06

Thank you! works like a charm!

MOneAtt 2010-09-15 21:00:39

ansaurus

tags:

views:

answers:

C++ Reading in words from a text file, word by word or char by char

related questions