views:

194

answers:

5

I've been googling around and reading through my book and trying to write out code to read through a text file and process words out of it, one by one, so i can put them in alphabetical order and keep a count of how many words where used and much a word was used. I can't seem to get my GetNextWord() function to work properly and it's driving me crazy.

I need to read the words in, one by one, and convert each letter to the lowercase if it is upper case. Which I know how to do that, and have done that successfully. It's just getting the word character by character and putting it into a string that is holding me up.

This is my most recent try at it: Any help would be amazing or a link to a tutorial over how to read from an input file word by word. (Word being alpha characters a-z and ' (don't) ended by whitespace, comma, period, ; , : , ect....

void GetNextWord()
{
    string word = "";
    char c;

    while(inFile.get(c))
    {
        while( c > 64 && c < 123 || c == 39)
        {
            if((isupper(c)))
            {
                c = (tolower(c));
            }
            word = word + c;
        }
        outFile << word;
    }
}
+2  A: 

You can read the file word by word by using the >> operator. For example, see this link: http://www.daniweb.com/forums/thread30942.html.

I excerpted their example here:

ifstream in ( "somefile" );
vector<string> words;
string word

if ( !in )
  return;

while ( in>> word )
  words.push_back ( word );
Scott Stafford
But that operator uses a different definition of what a word is from what was asked.
sbi
A: 

Personally I like to read in input with std::getline(std::istream&, std::string&) (in the <string> header, but you will of course also need to #include a stream header).

This function breaks on newline, which is whitespace by your problem's definition. But it's not the entire answer to your question. After reading in the line of text, you're going to need to use string operations or standard algorithms to break the string into words. Or you could loop over the string by hand.

The guts would be something like:

std::string buffer;
while (std::getline(std::cin, buffer) {
// break each line into words, according to problem spec
}
Max Lybbert
This could be problematic if there are hyphenated words in the text.
Space_C0wb0y
A hyphenated word like "back-scatter" doesn't matter, as the problem spec defines whether that is counted as one word or two. However, if I understand Space_C0wb0y correctly, words that are hyphenated to continue on the next line would require more logic than what I've shown. Since this program sounds a lot like homework, I doubt that will be valid input, but if it is, then there would be a need to handle such input.
Max Lybbert
A: 

I use

// str is a string that holds the line of data from ifs- the text file.
// str holds the words to be split, res the vector to store them in.
while( getline( ifs, str ) ) 
    split(str, res);


void split(const string& str, vector<string>& vec)
{
    typedef unsigned int uint;

    const string::size_type size(str.size());
    uint start(0);
    uint range(0);

 /* Explanation: 
  * Range - Length of the word to be extracted without spaces.
  * start - Start of next word. During initialization, starts at space 0.
  * 
  * Runs until it encounters a ' ', then splits the string with a substr() function,
  * as well as making sure that all characters are lower-case (without wasting time
  * to check if they already are, as I feel a char-by-char check for upper-case takes
  * just as much time as lowering them all anyway.                                       
 */
    for( uint i(0); i < size; ++i )
    {
        if( isspace(str[i]) )
        {
            vec.push_back( toLower(str.substr(start, range + 1)) );
            start = i + 1;
            range = 0;
        } else
            ++range;
    }
    vec.push_back( toLower(str.substr(start, range)) );
}

I'm not sure this is particularly helpful to you, but I'll try. The toLower function is a quick function that simply uses the ::toLower() function. This reads each char until a space, then stuffs it in an vector. I'm not entirely sure what you mean with char by char.

Do you want to extract a word character by a time? Or do you want to check each character as you go along? Or do you mean you want to extract one word, finish, and then come back? If so, I would 1) recommend a vector anyway, and 2) let me know so I can refactor the code.

SoulBeaver
my original plan was to read in a word, char by char at a time, and when it hits whitespace or any punctuation it would stop getting the word, turn all those chars into a string and send that string to my other function for further processing. turning all uppercase into lower case. IE "Don't" would become "don't".
MOneAtt
A: 

What's going to terminate your inner loop if c == 'a'? ASCII value for 'a' is 97.

Andrew Bainbridge
if c == a then its not gonna terminate the inner loop. the inner loop terminates if the char is not A-Z,a-z, and '
MOneAtt
+1  A: 

Your logic is wrong. The inner loop runs as long as c doesn't change, and there's nothing in it that would change c.

Why are you having two loops anyway? I think you might be confused about whether that function is supposed to read the next word or all words. Try to separate those concerns, put them into different functions (one of which is calling the other). I find it easiest to approach such problems in a top-down order:

while(inFile.good()) {
  std::string word = GetNextWord(inFile);
  if(!word.empty())
    std::cout << word << std::endl;
}

Now fill in the gaps by defining GetNextWord() to read everything up to the next word boundary.

sbi
I'll give this a try and report back with my results, Thanks
MOneAtt
Thank you! works like a charm!
MOneAtt